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Novel means and methods for the preparation and activation of nucleoside 

and nucleotide based drugs 

The present invention relates to novel means and methods for the preparation and 
activation of nucleoside and nucleotide based drugs. In particular, the present 
invention involves a method for the production of a polypeptide having or having 
enhanced kinase activity for a nucleoside or nucleotide analog and to polypeptides 
obtainable by said method. The present invention also relates to polynucleotides 
and vectors encoding said polypeptide obtainable by the method of the invention 
as well as to host cells transformed therewith. Antibodies against said polypeptide 
are also within the scope of the present invention. The present invention 
additionally relates to pharmaceutical and diagnostic compositions as well as kits 
comprising proteins having kinase activity for a nucleoside or nucleotide analog or 
the before described polypeptides, polynucleotides, vectors and antibodies. 
Furthermore, the present invention relates to the use of the before described 
proteins, polypeptides, polynucleotides, vectors and antibodies for the preparation 
of pharmaceutical compositions for treating, preventing and/or delaying a disease 
related to viral infection or cancer. In addition, the present invention relates to a 
method for identifying inhibitors of nucleoside or nucleotide kinases and to 
methods for identifying nucleoside or nucleotide based prodrugs employing the 
above mentioned polypeptides, polynucleotides, vectors and host cells. Also the 
invention relates to the compounds identifiable by said methods as well as to 
pharmaceutical and diagnostic compositions comprising said inhibitors. Moreover, 
the present invention relates to the use of proteins ,and polypeptides having 
nucleoside or nucleotide kinase activity or their encoding polynucleotides or 
vectors for the preparation of nucleoside or nucleotide phosphates or analogs and 
d rivatives thereof . 
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Several documents are cited throughout the text of this specification. Each of the 
documents cited herein (including any manufacturer's specifications, instructions, 
etc.) are hereby incorporated herein by reference; however, there is no admission 
that any document cited is indeed prior art as to the present invention. 

The class of medicinal agents termed prodrugs are compounds that exert a 
desired biological effect only after some sort of modification inside the body. AZT 
and d4T are examples of nucleoside analog prodrugs that attain their medicinal 
activity only after being phosphorylated three times to their triphosphate form by 
cellular kinases. Thus, the efficacy of these compounds is determined in part by 
the efficiency of the kinases that activate them. Compounds that are poorly 
activated (as is the case for AZT) do not achieve their full therapeutic potential. 
Moreover, the intermediate metabolites between the administered prodrug and the 
active triphosphate form can be toxic. Therefore, it is very important to use 
prodrugs that are efficiently transformed to their active form or to improve the 
activity of the rate limiting enzyme(s) in the activation pathway of a prodrug. The 
latter approach, however, was challenged in the art as being naive and doubtful to 
work at all; see, e.g., Balzarini, Nature Medicine 4 (1998), 2. Recently, Guettari, 
(Virology 235 (1997), 398-405) described the improvement of AZT metabolism by 
use of the Herpes Simplex virus-1 thymilidate kinase (HSV-1 TK) and suggested 
that gene transfer might be envisioned for genetic pharmacomoduiation of anti- 
viral drugs. However, the extent to which HSV-1 TK improved AZT metabolism 
was only about 7-fold and no general method had been presented how to improve 
or create new kinase proteins or prodrugs that are suitable for a therapeutic 
approach. In two other recent reports (Lavie, Nature Struct. Biol. 4 (1997), 601— 
604; Lavie, Nature Med. 3 (1997), 922—924) it could be shown that the P-loop of 
the yeast TmpK is involved in limiting the conversion to AZT to AZTTP. The 
conclusion of crystallographic studies presented in these reports with the yeast 
TmpK enzyme have, however, been questioned, in particular whether other TmpK 
enzymes such as the human counterpart have the problem with the P-loop 
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observed with the yeast enzyme (Kenyon, Nature Struct. Biol. 4 (1997), 595 — 
597). Hence, each enzyme seems to present a different situation, and no general 
method has been presented for modeling the catalytic domain of a kinase enzyme 
so as to obtain a high enzymatic activity with nucleoside or nucleotide analogs, 
such as AZT; nor was it even clear to what extent such method was required or 
possible. 

Thus, the technical problem of the present invention is to provide means and 
methods for the preparation and activation of nucleoside and nucleotide based 
drugs. 

The solution to this technical problem is achieved by providing the embodiments 
characterized in the claims. 

Accordingly, the invention relates to a method for the production of a polypeptide 
having or having enhanced kinase activity for a nucleoside or nucleotide analog, 
said method comprising substituting, adding or deleting at least one amino acid of 
a protein having nucleoside or nucleotide kinase activity at a position in the prot in 
where: 

(a) the amino acid is at position X 2 and/or X 3 in the consensus sequence 
GX,X 2 X 3 X 4 GK of the P-loop; 

(b) the amino acid is in the LID region; and/or 

(c) the amino acid is at position 105 in the amino acid sequence of human 
thymidylate kinase or at a corresponding position in a protein having 
nucleoside or nucleotide kinase activity. 

In context with the present invention, the term "nucleoside or nucleotide analog" 
refers to naturally occurring nucleosides that are modified at the sugar/base while 
nucleotide analogs may additionally have modifications, e.g., of the a-phosphate. 
Usually, said nucleoside is adenosine, cytidine, guanosine, thymidine or uridine or 
based on any of these, such as inosine. Nucleoside and nucleotide analogs useful 
in accordance with the present invention include 2', 3'-dideoxynucleoside analogs 
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and other analogs lacking a hydroxyl group equivalent to the 3'-OH of natural 
nucleoside in terms of their reactivity with reverse transcriptase or other DNA 
polymerases. Preferably said nucleotide is a nucleoside monophosphate, most 
preferably said nucleoside monophosphate is thymidylate. A prominent example 
for a nucleoside (thymidine) analog is 3'-azido-3'-deoxythymidine (AZT) used in the 
treatment of AIDS. 

The term "kinase activity for a nucleoside or nucleotide analog", within the 
meaning of the present invention refers to the capability of an enzyme to catalyze 
the phosphorylation of a nucleoside or nucleotide analog to its corresponding 
mono-, di- or triphosphate. Naturally, such enzymes, also termed kinases herein, 
transfer a phosphoryl group from a donor molecule (usually ATP) to an acceptor 
molecule (in the present invention a nucleoside, nucleotide monophosphate, or a 
nucleotide diphosphate). In the case of thymidylate kinase, the physiological 
substrates are ATP and dTMP resulting in ADP and dTDP. AZT is very similar to 
dTMP but has an azido moiety at the 3' position instead of a hydroxyl group, and is 
thus also activated by the same enzymes that activate thymidine to thymidine 
triphosphate (dTTP, a substrate for DNA polymerases). 

The term "P-loop", as used herein means a motif that has been identified in many 
ATP- and GTP-binding proteins (Saraste et al., 1990). The binding of the 
nucleoside or nucleotide to the P-loop is through main-chain nitrogen atoms to the 
phosphates of the nucleotide, and through a strictly conserved lysine. A 
comparison of the amino acid sequences of, for example, thymidylate kinases 
reveal the consensus sequence recited in (a) above. However, other kinases 
display this motif as well and could thus be used in accordance of the present 
invention. 

. The term "LID region" refers to the mobile part of the kinase enzyme that in case of 
so called class II (type II) TmpKs (bacterial TmpKs such as from E. coli) carries 
one or more catalytically important arginine or lysine residues and undergoes 
substantial structural rearrangement upon substrate binding. Accordingly, the LID 
region of class I (type I) TmpKs (eukaryotic enzymes such as human or yeast 
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TmpK) constitutes the same flexible region without the catalytically important 
arginine or lysine residues. As will be appreciated by the person skilled in the art 
from the disclosure herein the P-loop and/or the LID regions of class II (type II) and 
class I (type I) enzymes can be transferred into each other via, e.g. amino acid 
substitution(s), additions(s) and/or deletion(s). Thus, according to the method of 
the present invention, any protein having nucleoside or nucleotide kinase activity 
can be used and modified so as to obtain a polypeptide which displays said kinase 
activity for a nucleoside or nucleotide analog. Moreover, if said protein already 
displays such activity at a basal level, it is possible to improve its kinase activity for 
nucleoside and nucleotide analogs compared to the wildtype protein. Furthermore, 
it is possible using the teaching of the present invention to confer to a polypeptide 
kinase activity for a nucleoside and nucleotide analog via, e.g., protein design 
using, for example, computer based redesign of proteins. The resultant proteins 
and polypeptides obtainable by the method of the present invention can then be 
tested for their kinase activity by using methods known in the art as described, 
e.g., in the appended examples or in Guettari (1997). 

The present invention is based on the following observations. AZT is a prodrug 
widely used in the treatment of HIV infection. While the concentration of the active 
form of AZT, AZT triphosphate (AZT-TTP), reaches only the low micromolar range 
in human cells exposed to AZT, AZT monophosphate (AZT-MP) accumulates to 
millimolar concentration. Thus, not only are suboptimal concentrations of the active 
form of the drug produced, but as a further consequence, a highly toxic 
intermediate (AZT-MP) is produced at high concentration. The reason behind this 
accumulation of AZT-MP is the poor phosphorylation of AZT-MP to AZT-DP by the 
enzyme thymidylate kinase (Furman, Proc. Natl. Acad. Sci. USA 83 (1986), 8333- 
8337). 

In accordance with the present invention, surprisingly it has been found that the 
reasons behind this low activity of the enzyme is to be found in the interplay of two 
motifs, the P-loop and the LID-region. In experiments performed in accordance 
with the present invention, the crystal structure of yeast thymidylate kinase (TmpK) 
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complexed with the bisubstrate inhibitor P1-(5'-adenosyl) P5-(5'-thymidyl) 
pentaphosphate (TP 5 A) was determined at 2.0 A resolution. In this complex, TmpK 
adopts a closed conformation with a region (LID) of the protein closing upon the 
substrate and forming a helix. The interactions of TmpK and TP 5 A revealed that 
arginine 15, which is located in the phosphate binding loop (P-loop) sequence, 
plays a catalytic role by interacting with an oxygen atom of the transferred 
phosphoryl group. Unlike other nucleoside monophosphate kinases where basic 
residues from the LID region participate in stabilizing the transition state, class I 
(type I) TmpK lack such residues in the LID region. The present inventors attribute 
this function to Arg15 of the P-ioop. TmpK plays an important role in the 
phosphorylation of the AIDS prodrug AZT. The structures of TmpK with dTMP and 
with AZT-MP (Lavie et al, Nature Structural Biology 4 (1997), 601-604) implicate 
the movement of the Arg15 in response to AZT-MP binding as an important factor 
for the 200 fold reduced catalytic rate with AZT-MP. TmpK from E. coli lacks this 
arginine in its P-loop while having basic residues in the LID region. This suggested 
that if such a P-loop movement were to occur in the E. coli TmpK upon AZT-MP 
binding, it should not have such a detrimental effect on catalysis. This hypothesis 
was tested and as expected by the inventors E. coli TmpK phosphorylates AZT- 
MP only 2.5 times slower than dTMP. 

Based on the understanding gained from the present work it became possible to 
mutate the wild type enzyme in order to attain higher activity for AZT-MP. With the 
advent of gene therapy, it is now conceivable to transfer the gene of such a 
modified kinase to the HIV-susceptible cells in an infected individual. The 
subsequent administration of AZT should result in a higher therapeutic index 
compared with an individual possessing only the wild type kinase. 
The next few paragraphs will further illustrate the invention by way of example for 
TmpKs and describe how kinases catalyze the phosphoryl transfer reaction, and 
list the possible changes the person skilled in the art can consider in accordance 
with the present invention, to achieve (higher) kinase activity for nucleoside or 
nucleotide analogs. 
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Kinases transfer a phosphoryl group from a donor molecule (usually ATP) to an 
acceptor molecule (in the present invention a nucleoside, nucleotide 
monophosphate, or a nucleotide diphosphate). In the case of thymidylate kinase, 
the physiological substrates are ATP and dTMP resulting in ADP and dTDP. AZT 
is very similar to dTMP but has an azido moiety at the 3' position instead of an 
hydroxy! group, and is thus also activated by the same enzymes that activate 
thymidine to thymidine triphosphate (dTTP, a substrate for DNA polymerases). 
Enzymes catalyze chemical reactions by preferentially stabilizing the transition 
state of a reaction over the corresponding ground state. Since it is highly probable 
that in phosphoryl transfer reactions an additional negative charge is formed at the 
transition state in comparison to the total charge of the ground state, stabilizing 
such a charge lowers the energy barrier for the reaction. Enzymes can stabilize 
negative charges either by using positively charged amino acid residues (Arg, Lys, 
or His) or by binding metals, or both. Kinases such as thymidylate kinases utilize 
both of these possibilities, but since the interaction with the metal (a bound 
magnesium ion) is probably similar in the ground and transition states, the energy 
barrier between ground and transition state is not appreciably decreased (this 
does not mean that the metal is not important for catalysis; the absence of metal 
abolishes all activity). It is the positive side chain of arginine residue(s) that change 
their degree of interaction between the ground and transition states and thus 
achieve preferential stabilization of the transition state over the ground state (i.e. a 
lowering of the energy barrier for the phosphoryl transfer). 

The structure of thymidylate kinase complexed with the bisubstrate analog TPgA 
that has been solved in accordance with the present invention allows to observe 
which basic residues interact with the nucleotides. To date a unique feature of 
class I (type I) TmpKs is that an arginine from the P-loop interacts directly with the 
transferred phosphoryl group. The present inventors postulate that this interaction 
is responsible for the low activity of TmpK with AZT-MP since the azido group of 
AZT pushes the P-loop from its position in comparison to its position when dTMP 
is bound. 
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An important observation made in accordance with the present invention is that the 
E. coli TmpK (a class II (type II) enzyme) phosphorylates AZT-MP very rapidly. The 
E. coli TmpK has no arginine in its P-loop (a glycine instead) but rather a number 
of basic resides in a structural motif that is called LID region. This is an example of 
the scaffold nature of enzymes: Class I (type I) TmpKs have a basic residue in the 
P-loop motif but none in the LID, whereas class II (type II) TmpKs have the 
reverse; i.e. none in the P-loop and a few in the LID. In other words, class I (type I) 
kinases use the P-loop arginine to stabilize the transition state whereas class II 
(type II) use arginine(s) from the LID for the same purpose. 

The above result has immediate implications. Instead of modifying the human or 
other class I (type I) kinase to better phosphorylate AZT-MP, the E. coli TmpK 
could be used for this purpose (and possibly any other class II (type II) TmpK). The 
kinetic results described in the appended examples demonstrate that the E. coli 
TmpK is at least 300-fold faster in phosphorylating AZT-MP than the human TmpK. 
However, because of immunological and other reasons it might be advantageous 
to use a modified human TmpK over the E. coli TmpK in the proposed gene 
therapy procedure. The modification of the human (or any other class I (type I)) 
TmpK would be according to the method invention as described above. 
Advantageously the method of the invention comprises at least one of the 
following steps: 

• Mutation of the LID region of class I (type I) thymidylate kinases to contain 
one or more basic residues and in doing so mimic the LID motif of class II 
(type II) TmpKs. This mutation of the LID could be with or without a 
concomitant mutation in the P-loop. A concomitant substitution in the P-loop 
(arginine to a smaller amino acid) might be advantageous in order to avoid a 
steric clash between the P-loop basic residue and the newly introduced LID 
basic residue(s). The results obtained in accordance with the present 
invention demonstrate that such an approach is appropriate but they do not 
rule out that, for example, a lysine in the P-loop might be tolerated. 
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. Substitution of the entire LID of a class I (type I) TmpK with that from a class 
II (type II) TmpK, with or without a concomitant mutation in the P-ioop; see, 
e.g., Example 8. 

» Selection of appropriate mutant enzymes by measurements of the catalytic 
efficiency with a coupled colorometric assay that links the phosphorylation 
of, e.g., the nucleoside monophosphate to diphosphate to a change of the 
optical density. This can be extended to a high throughput screening (HTS) 
setup with a microtiter plate reader. 

The method of the invention can be performed using conventional techniques 
known in the art, for example, by using amino acid deletion(s), insertion(s), 
substitution(s), addition(s), and/or recombination(s) and/or any other 
modification(s) known in the art either alone or in combination. Methods for 
introducing such modifications in the DNA sequence underlying the amino acid 
sequence of a protein having kinase activity are well known to the person skilled in 
the art; see, e.g., Sambrook, Molecular Cloning A Laboratory Manual, Cold Spring 
Harbor Laboratory (1989) N.Y. 

In a preferred embodiment of the method of the invention said protein to be 
modified with the method described above is derived from a eukaryotic or 
prokaryotic organism, preferably said organism is human or yeast. As described 
above, the present inventors have solved the 3-dimensional structure of TmpK 
complexed with the bisubstrate inhibitor PHS'-AdenosyOPS-^'-Thymidyl) 
pentaphosphate (TP 5 A). This structure, taken together with those of TmpK 
complexed with dTMP and with AZT-MP (Lavie et al.. 1997b) allowed the inventors 
to identify the residues important for catalysis, and to propose an explanation for 
the slow phosphorylation of AZT-MP. In addition, based on that knowledge, it 
permits the prediction that the rate of AZT-MP phosphorylation by the E. coli TmpK 
would be similar to the phosphorylation rate of the physiological substrate dTMP. 
In fact, as shown in accordance with the present invention E. coli TmpK 
discriminates poorly against AZT-MP, supporting the inventors finding that Arg15 
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of yeast and human thymidylate kinases is responsible for discrimination against 
AZT-MP as a substrate (the human thymidylate kinase has 44% amino acid 
identity and >63% similarity with the yeast enzyme with ail catalytically important 
residues conserved between the two species; see Figure 4). Furthermore, as 
demonstrated in Example 8 herein below the yeast TmpK could, by applying the 
method of the invention, converted into an enzyme displaying high kinase activity 
for the nucleoside analog AZT. Similar results are to be expected when using the 
human or any other corresponding eukaryotic enzyme. 

In a preferred embodiment of the method of the invention said protein comprises 
the amino acid sequence of any one of SEQ ID NOS: 1 to 13 or a fragment 
thereof. The amino acid sequenes depicted in SEQ ID NOS: 1 to 13 belong to 
various TmpKs of eukaryotic and prokaryotic origin and comprise, for example, the 
amino acid sequence of E. coli TmpK. With the method of the present invention 
and the teaching provided herein, it is, e.g., possible to improve the kinase activity 
of the human or yeast enzyme for nucleoside or nucleotide analogs, in particular 
for AZT. Furthermore, the teaching of the present invention now enables rationale 
shuffling or domains from, e.g., E. coli TmpK to the corresponding enzyme from 
human, mouse or yeast. On the other hand it is also possible to modify the amino 
acid sequence of the £. coli enzyme to more closely resemble that of the 
corresponding eukaryotic, preferably human enzyme while the P-loop and the LID 
region remain substanstially unaffected and therefore the resultant polypeptide 
retains its kinase activity for nucleoside and nucleotide analogs. For the rational 
design of polypeptides produced according to the method of the invention 
computer programs may be used such as BRASMOL that are obtainable from the 
Internet. Furthermore, folding simulations and computer redesign of structural 
motifs can be performed using other appropriate computer programs (Olszewski, 
Proteins 25 (1996), 286-299; Hoffman, Comput. Appl. Biosci. 11 (1995), 675-679). 
Computers can be used for the conformational and energetic analysis of detail d 
protein models (Monge, J. Mol. Biol. 247 (1995), 995-1012; Renouf, Adv. Exp. 
Med. Biol. 376 (1995), 37-45). 
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In another preferred embodiment the method of of the present invention said 
amino acid which is substituted or added in (a) is glycine or lysine. In case that the 
amino acid is changed in the P-loop of class I (type I) enzymes the amino acid that 
is changed is preferably a small amino acid such as glycine. However, alanine, 
less preferably valine, threonine, serine or glutamic acid may work as well although 
probably less effective. As described above, the LID region of class I (type I) 
thymidylate kinases may be substituted in accordance with the method of the 
invention to contain at least one basic residue. Thus, in a preferred embodiment of 
the method of the invention said amino acid which is substituted or added in (b) is 
a basic amino acid, preferably arginine. Usually, said LID has the consensus 
sequence R/KXXXXXERYEXXXXQ. This consensus sequence or substantially 
identical sequences can be determined by comparison of the amino acid 
sequences of known nucleoside and nucleotide kinases, for example those shown 
in SEQ ID NOS: 1 to 13. Preferably, said polypeptide obtainable by the method of 
the invention exhibits kinase activity for a nucleoside or nucleotide analog which is 
higher than that of the corresponding wild type enzyme, preferably higher than that 
of an eukaryotic, most preferably higher than that of the human enzyme. 
Preferably, said kinase activity for a nucleoside or nucleotide analog is 5-fold, more 
preferably 10-fold, still more preferably 30-fold and most preferably 300-fold 
improved compared to the corresponding wild-type enzyme. Furthermore, it is 
preferred within the method of the invention that said nucleoside analog is AZT, 
d4T or has the following structure: 




B 



and 



wherein B is any nucleobase or analog thereof, and X is O, CH 2l NH or S. 
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As discussed above, the amino acid substitution (s) or other modifications 
performed in the amino acid sequence of a given kinase may result in the P-loop 
and/or in the LID region of a bacterial nucleoside or nucleotide kinase, preferably 
those of the TmpK of E.coli; see also Example 8. 

As mentioned before, for the long term aim of gene therapeutic potentiation of AZT 
effectiveness, there could be significant advantages from use of a modified human 
enzyme. Therefore, the mutational studies were extended to the human enzyme; 
see Example 9. Contrary to all expectations arising from the studies on the yeast 
enzyme, replacement of Arg-16 (equivalent to Arg-15 in the yeast enzyme) did not 
lead to a loss in catalytic activity, but to a slight gain. It is possible that this is 
related to the fact that the k^, for the human enzyme is much slower than for the 
yeast enzyme (0.67 cf. 35 s 1 ), which is already a very slow kinase when compared 
with other nucleoside monophosphate kinases. Without intending to be bound by 
theory it is believed that since the catalytic machinery in both enzymes appears to 

be identical, as shown by sequence comparison and 3-D structural determination. 

it is possible that the chemical step (i.e. phosphate transfer) is not rate limiting, but 

rather product release, so that slowing down a relatively rapid chemical step might 

not have any influence on the overall rate. 

In keeping with expectations arising from the yeast TMPK data was the fact that 
introduction of the E. coli lid region without removing Arg-16 led to a drop in 
catalytic activity. However, and most dramatically, a combination of introduction of 
the E. coli lid with replacement of Arg-16 by glycine not only restored more than full 
wild type catalytic activity, but resulted in a protein which is more efficient with 
AZT-MP than with TMP. The increase in activity with AZT-MP is approximately 300 
fold. This mutant thus has properties which are highly attractive for improving the 
potency of AZT, since the efficiency of AZTMP phosphorylation is improved 
dramatically with only a minor increase in TMP phosphorylation activity. The latter 
is an important aspect, since AZTTP must compete with TTP for HIV-reverse 
transcriptase catalysed addition to the end of a growing (HIV) DNA chain. 
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The mutants of TMPK described so far showing altered specificity for AZTMP and 
TMP were produced according to rational considerations based on comparative 
structure-function studies of TMPK from three different sources. In this respect, a 
further striking result is that obtained by replacing Phe-105 in the human enzyme 
by tyrosine. The rational for this was that in the yeast enzyme, the corresponding 
residue (at position 102) is a tyrosine, and its hydroxyl group interacts with the 
carboxylate side chain of Asp-14, which (like Asp-15 in the human enzyme) 
appears to be an essential residue for the catalytic mechanism. The Phe-105 Tyr 
mutant of the human enzyme shows reduced activity with TMP, but, unexpectedly, 
significantly increased activity with AZTMP, so that the latter is now a better 
substrate than the former, see Example 9. Thus, and again without intending to be 
bound by theory it is believed that substitution of amino acid(s) that interact with 
the LID or P-loop, and in particular, if present, with the carboxylate side chain of 
Asp leads to a conformational change of the catalytic center of the kinase enzyme 
and fulfils the requirement of greatly increased AZTMP phosphorylating activity 
without increasing (actually decreasing) TMP phosphorylation. An amino acid 
position in a nucleotide or nucleoside kinase corresponding to amino acid residue 
105 in the human thymidylate kinase can be determined by comparison and 
alignment of the amino acid sequences of known nucleoside and nucleotide 
kinases, for example, those shown in SEQ ID NOS: 1 to 13 using, e.g., the 
program GCG. The specificity ratio (AZTMPTMP) is actually identical with that of 
the mutant containing Gly-16 and the E. coli lid, although the overall activity is a 
factor of ca. 1 0 lower. 

The results obtained in accordance with the present invention show how rational 
considerations have led to generation of TMPK mutants capable of 
phosphorylating the nucleoside pro-drug AZT with high efficiency. This finding 
gives rise to several applications in the medical and diagnostic field which will be 
explained in more detail below. 

In another embodiment the present invention relates to a polynucleotide encoding 
the polypeptide obtainable by the method of the invention. Said polynucleotide 
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may be, e.g., DNA, cDNA, genomic DNA, RNA or synthetically produced DNA or 
RNA or a recombinantly produced chimeric nucleic acid molecule comprising any 
of those polynucleotides either alone or in combination. Preferably said 
polynucleotide is part of a vector. Such vectors may comprise further genes such 
as marker genes which allow for the selection of said vector in a suitable host cell 
and under suitable conditions. 

In a further preferred embodiment the polynucleotide of the invention is operatively 
linked to expression control sequences allowing expression in prokaryotic or 
eukaryotic cells. 

Expression of said polynucleotide comprises transcription of the polynucleotide 
into a translatable mRNA. Regulatory elements ensuring expression in eukaryotic 
cells, preferably mammalian cells, are well known to those skilled in the art. They 
usually comprise regulatory sequences ensuring initiation of transcription and 
optionally poly-A signals ensuring termination of transcription and stabilization of 
the transcript. Additional regulatory elements may include transcriptional as well as 
translational enhancers. Possible regulatory elements permitting expression in 
prokaryotic host cells comprise, e.g., the lac, trp or tac promoter in E. coli, and 
examples for regulatory elements permitting expression in eukaryotic host cells are 
the AOX1 or GAL1 promoter in yeast or the CMV-, SV40- , RSV-promoter (Rous 
sarcoma virus), CMV-enhancer, SV40-enhancer or a globin intron in mammalian 
and other animal cells. Beside elements which are responsible for the initiation of 
transcription such regulatory elements may also comprise transcription termination 
signals, such as the SV40-poly-A site or the tk-poly-A site, downstream of the 
polynucleotide. In this context, suitable expression vectors are known in the art 
such as Okayama-Berg cDNA expression vector pcDV1 (Pharmacia), pCDM8, 
pRc/CMV, pcDNAI, pcDNA3 (In-vitrogene), pSPORTI (GIBCO BRL). 

The polynucleotide of the invention can be used alone or as part of a vector to 
express the polypeptide having kinase activity for nucleoside or nucleotide analogs 
in cells, for, e.g., gene therapy or diagnostics of diseases related to viral infections 
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and cancer. The polynucleotide or vector containing the DNA sequence encoding 
a kinase for nucleoside or nucleotide analogs is introduced into the cells which in 
turn produce the protein of interest. Gene therapy, which is based on introducing 
therapeutic genes into cells by ex-wVo or in-vivo techniques is one of the most 
important applications of gene transfer. Suitable vectors and methods for in-vitro or 
in-vivo gene therapy are described in the literature and are known to the person 
skilled in the art; see, e.g., Giordano, Nature Medicine 2 (1996), 534-539; Schaper, 
Circ. Res. 79 (1996), 911-919; Anderson, Science 256 (1992), 808-813; Isner, 
Lancet 348 (1996), 370-374; Muhlhauser, Circ. Res. 77 (1995), 1077-1086; Wang, 
Nature Medicine 2 (1996), 714-716; W094/29469; WO 97/00957 or Schaper, 
Current Opinion in Biotechnology 7 (1996). 635-640, and references cited therein. 
The polynucleotides and vectors of the invention may be designed for direct 
introduction or for introduction via liposomes, or viral vectors (e.g. adenoviral, 
retroviral) into the ceil. Preferably, said cell is a germ line cell, embryonic cell, or 
egg cell or derived therefrom, most preferably said cell is a stem cell. 

Furthermore, the present invention relates to vectors, particularly plasmids, 
cosmids, viruses and bacteriophages used conventionally in genetic engineering 
that comprise a polynucleotide of the invention. Preferably, said vector is an 
expression vector and/or a gene transfer or targeting vector. Expression vectors 
derived from viruses such as retroviruses, vaccinia virus, adeno-associated virus, 
herpes viruses, or bovine papilloma virus, may be used for delivery of the 
polynucleotides or vector of the invention into targeted cell population. Methods 
which are well known to those skilled in the art can be used to construct 
recombinant viral vectors; see, for example, the techniques described in 
Sambrook, Molecular Cloning A Laboratory Manual, Cold Spring Harbor 
Laboratory (1989) N.Y. and Ausubel, Current Protocols in Molecular Biology, 
Green Publishing Associates and Wiley Interscience, N.Y. (1989). Alternatively, 
the polynucleotides and vectors of the invention can be reconstituted into 
liposomes for delivery to target cells. 
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The present invention furthermore relates to host cells transformed with a 
polynucleotide or vector of the invention. Said host cell may be a prokaryotic or 
eukaryotic cell. The polynucleotide or vector of the invention which is present in the 
host cell may either be integrated into the genome of the host cell or it may be 
maintained extrachromosomally. In this respect, it is also to be understood that the 
recombinant DNA molecule of the invention can be used for "gene targeting" 
and/or "gene replacement", for restoring a mutant gene or for creating a mutant 
gene via homologous recombination; see for example Mouellic, Proc. Natl. Acad. 
Sci. USA, 87 (1990), 4712-4716; Joyner, Gene Targeting, A Practical Approach, 
Oxford University Press. 

The host cell can be any prokaryotic or eukaryotic cell, such as a bacterial, insect, 
fungal, plant, animal or human cell. Preferred fungal cells are, for example, those 
of the genus Saccharomyces, in particular those of the species S. cerevisiae. The 
term "prokaryotic" is meant to include all bacteria which can be transformed or 
transfected with a polynucleotide for the expression of a polypeptide of the 
invention. Prokaryotic hosts may include gram negative as well as gram positive 
bacteria such as, for example, E. coli, S. typhimurium, Serratia marcescens and 
Bacillus subtilis. A polynucleotide coding for a polypeptide of the invention can be 
used to transform or transfect the host using any of the techniques commonly 
known to those of ordinary skill in the art. Especially preferred is the use of a 
plasmid or a virus containing the coding sequence of the polypeptide having 
kinase activity for a nucleoside or nucleotide analog for purposes of prokaryotic 
transformation or transfection, respectively. Methods for preparing fused, operably 
linked genes and expressing them in bacteria are well-known in the art (Maniatis, 
et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, 
Cold Spring Harbor, NY, 1989). The genetic constructs and methods described 
therein can be utilized for expression of the polypeptide of the invention in 
prokaryotic hosts. In general, expression vectors containing promoter sequenc s 
which facilitate the efficient transcription of the inserted polynucleotide are used in 
connection with the host. The expression vector typically contains an origin of 
replication, a promoter, and a terminator, as well as specific genes which are 
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capable of providing phenotypic selection of the transformed cells. The 
transformed prokaryotic hosts can be grown in fermentors and cultured according 
to techniques known in the art to achieve optimal cell growth. The polypeptides of 
the invention can then be isolated from the grown medium, cellular lysates, or 
cellular membrane fractions. The isolation and purification of the microbially 
expressed polypeptides of the invention may be by any conventional means such 
as, for example, preparative chromatographic separations and immunological 
separations such as those involving the use of monoclonal or polyclonal 
antibodies. 

Thus, in a further embodiment the invention relates to a method for the production 
of a polypeptide having kinase activity for a nucleoside or nucleotide analog 
comprising culturing a host cell as defined above under conditions allowing the 
expression of the polypeptide and recovering the produced polypeptide from the 
culture. 

In another embodiment the present invention relates to a method for producing 
cells capable of expressing a polypeptide having nucleoside or nucleotide kinase 
activity for nucleoside or nucleotide analogs comprising genetically engineering 
cells with the polynucleotide or with the vector of the invention. The cells 
obtainable by the method of the invention can be used, for example, to test the 
capacity of the polypeptides of the invention to improve the metabolism of 
nucleoside or nucleotide analogs and, if mammalian cells are used, to test their 
impact on anti-viral activity of nucleoside and nucleotide analogs. Furthermore, the 
cells can be used to study known and unkown nucleoside and nucleotide analogs 
for their ability to be converted to the corresponding mono-, di- or triphosphates. 
The cells obtainable by the above-described method may also be used for the 
screening methods referred to herein below. 

Furthermore, the invention relates to a polypeptide having kinase activity for a 
nucleoside or nucleotide analog encoded by a polynucleotide according to the 
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invention or obtainable by the above-described methods or from cells produced by 
the method described above. 

In this context it is also understood that the polypeptides according to the invention 
may be further modified by conventional methods known in the art. By providing 
the polypeptides according to the present invention it is also possible to determine 
the portions relevant for their biological activity, namely their kinase activity. This 
may allow the construction of chimeric proteins comprising an amino acid 
sequence derived from a polypeptide of the invention which is crucial for kinase 
activity and other functional amino acid sequences e.g. nuclear localization 
signals, transactivating domains, DNA-binding domains, hormone-binding 
domains, protein tags (GST, GFP, h-myc peptide, Flag, HA peptide) which may be 
derived from the same or from heterologous proteins. 

The present invention furthermore relates to antibodies specifically recognizing a 
polypeptide according to the invention which has kinase activity for a nucleoside or 
nucleotide analog. Advantageously, the antibody specifically recognizes a 
polypeptide according to the invention which has kinase activity for a nucleoside or 
nucleotide analog but does not recognize a polypeptide which is a wild type 
starting protein of such a polypeptide and which has no or less kinase activity for 
nucleoside or nucleotide analogs than the polypeptides of the invention. 
Antibodies against the polypeptide of the invention can be prepared by well known 
methods using a purified polypeptide according to the invention or a synthetic 
fragment derived therefrom as an antigen. Monoclonal antibodies can be 
prepared, for example, by the techniques as originally described in K6hler and 
Milstein, Nature 256 (1975), 495, and Galfre, Meth. Enzymol. 73 (1981), 3, which 
comprise the fusion of mouse myeloma cells to spleen cells derived from 
immunized mammals. The antibodies can be monoclonal antibodies, polyclonal 
antibodies or synthetic antibodies as well as fragments of antibodies, such as Fab, 
Fv or scFv fragments etc. Furthermore, antibodies or fragments thereof to the 
aforementioned polypeptides can be obtained by using methods which are 
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described, e.g., in Harlow and Lane "Antibodies, A Laboratory Manual", CSH 
Press, Cold Spring Harbor, 1988. These antibodies can be used, for example, for 
the immunoprecipitation and immunolocalization of the polypeptides of the 
invention as well as for the monitoring of the presence of such polypeptides, for 
example, in recombinant organisms, and for the identification of compounds 
interacting with the proteins according to the invention. For example, surface 
plasmon resonance as employed in the BIAcore system can be used to increase 
the efficiency of phage antibodies which bind to an epitope of the polypeptide of 
the invention (Schier, Human Antibodies Hybridomas 7 (1996), 97-105; Malmborg, 
J. Immunol. Methods 183 (1995), 7-13). 

Moreover, the present invention relates to a composition, preferably a 
pharmaceutical composition comprising 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity for a 
nucleoside or nucleotide analog or a polynucleotide encoding and capable 
of expressing said protein in vivo or a vector containing said polynucleotide; 
or 

(b) the above-described polypeptide, polynucleotide or vector of the invention; 

(c) optionally a nucleoside or nucleotide analog; and 

(d) optionally a pharmaceutically acceptable carrier. 

As described above and as shown in the appended examples prokaryotic kinases 
such as £. coii TmpK belong to class II (type II) enzymes which display superior 
phosphorylation properties for nucleoside and nucleotide analogs, in particular for 
AZT. Thus, it is not necessary to further modify such prokaryotic proteins 
according to the method of the invention but it is possible to employ them in a 
pharmaceutical composition of the invention unmodified or substantially 
unmodified. 

Examples of suitable pharmaceutical carriers are well known in the art and include 
phosphate buffered saline solutions, water, emulsions, such as oil/water 
emulsions, various types of wetting agents, sterile solutions etc. Compositions 
comprising such carriers can be formulated by well known conventional methods. 
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These pharmaceutical compositions can be administered to the subject at a 
suitable dose. Administration of the suitable compositions may be effected by 
different ways, e.g., by intravenous, intraperitoneal, subcutaneous, intramuscular, 
topical or intradermal administration. The dosage regimen will be determined by 
the attending physician and other clinical factors. As is well known in the medical 
arts, dosages for any one patient depends upon many factors, including the 
patient's size, body surface area, age, the particular compound to be administered, 
sex, time and route of administration, general health, and other drugs being 
administered concurrently. Generally, the regimen as a regular administration of 
the pharmaceutical composition should be in the range of 1 ug to 10 mg units per 
day. If the regimen is a continuous infusion, it should also be in the range of 1 ug. 
to 10 mg units per kilogram of body weight per minute, respectively. Progress can 
be monitored by periodic assessment. Dosages will vary but a preferred dosage 
for intravenous administration of DNA is from approximately 10 6 to 10 12 copies of 
the DNA molecule. The compositions of the invention may be administered locally 
or systemically. Administration will generally be parenterally, e.g., intravenously; 
DNA may also be administered directly to the target site, e.g., by biolistic delivery 
to an internal or external target site or by catheter to a site in an artery. 

It is envisaged by the present invention that the various polynucleotides and 
vectors of the invention are administered either alone or in any combination using 
standard vectors and/or gene delivery systems, and optionally together with an 
appropriate compound, for example a nucleoside or nucleotide analog, and/or 
together with a pharmaceutically acceptable carrier or excipient. Subsequent to 
administration, said polynucleotides or vectors may be stably integrated into the 
genome of the subject. On the other hand, viral vectors may be used which are 
specific for certain cells or tissues, preferably for CD4 cells and persist in said 
cells. Suitable pharmaceutical carriers and excipients are well known in the art. 
The pharmaceutical compositions prepared according to the invention can be used 
for the prevention or treatment or delaying of different kinds of diseases, which are 
related to viral infection or cancer. 
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Furthermore, it is possible to use a pharmaceutical composition of the invention 
which comprises polynucleotide or vector of the invention in gene therapy. Suitable 
gene delivery systems may include liposomes, receptor-mediated delivery 
systems, naked DNA, and viral vectors such as herpes viruses, retroviruses, 
adenoviruses, and adeno-associated viruses, among others. Delivery of nucleic 
acids to a specific site in the body for gene therapy may also be accomplished 
using a biolistic delivery system, such as that described by Williams (Proc. Natl. 
Acad. Sci. USA 88 (1991), 2726-2729). 

Standard methods for transfecting cells with recombinant DNA are well known to 
those skilled in the art of molecular biology, see, e.g., WO 94/29469. Gene therapy 
may be carried out by directly administering the recombinant DNA molecule or 
vector of the invention to a patient or by transfecting cells such as CD4 with the 
polynucleotide or vector of the invention ex vivo and infusing the transfected cells 
into the patient. Furthermore, research pertaining to gene transfer into cells of the 
germ line is one of the fastest growing fields in reproductive biology. Gene therapy, 
which is based on introducing therapeutic genes into cells by ex-vivo or in-vivo 
techniques is one of the most important applications of gene transfer. Suitable 
vectors and methods for in-vitro or in-vivo gene therapy are described in the 
literature and are known to the person skilled in the art; see, e.g., WO 94/29469, 
WO 97/00957 or Schaper (Current Opinion in Biotechnology 7 (1996), 635-640) 
and references cited above. The polynucleotides and vectors comprised in the 
pharmaceutical composition of the invention may be designed for direct 
introduction or for introduction via liposomes, or viral vectors (e.g. adenoviral, 
retroviral) containing said recombinant DNA molecule into the cell. Preferably, said 
cell is a germ line cell, embryonic cell, stem cell or egg cell or derived therefrom. 
The pharmaceutical compositions according to the invention can be used for the 
treatment of diseases hitherto unknown as being related to viral infection or 
cancer. An embryonic cell can be for example an embryonic stem cell as described 
in, e.g., Nagy, Proc. Natl. Acad. Sci. USA 90 (1993) 8424-8428. 
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It is to be understood that the introduced polynucleotides and vectors of the 
invention express the polypeptide or protein having kinase activity for nucleoside 
or nucleotide analogs after introduction into said cell and preferably remain in this 
status during the lifetime of said cell. For example, cell lines which stably express 
the polynucleotide under the control of appropriate regulatory sequences may be 
engineered according to methods well known to those skilled in the art. Rather 
than using expression vectors which contain viral origins of replication, host cells 
can be transformed with the polynucleotide or vector of the invention and a 
selectable marker, either on the same or separate vectors. Following the 
introduction of foreign DNA, engineered cells may be allowed to grow for 1-2 days 
in an enriched media, and then are switched to a selective media. The selectable 
marker in the recombinant plasmid confers resistance to the selection and allows 
for the selection of cells having stably integrated the plasmid into their 
chromosomes and grow to form foci which in turn can be cloned and expanded 
into cell lines. Such engineered cell lines are particularly useful in screening 
methods described below. 

A number of selection systems may be used, including but not limited to the 
herpes simplex virus thymidine kinase (Wigler, Cell 11(1977), 223), hypoxanthine- 
guanine phosphoribosyltransferase (Szybalska, Proc. Natl. Acad. Sci. USA 48 
(1962), 2026), and adenine phosphoribosyltransferase (Lowy, Cell 22 (1980), 817) 
in tk\ hgprt or aprf cells, respectively. Also, antimetabolite resistance can be used 
as the basis of selection for dhfr, which confers resistance to methotrexate (Wigler, 
Proc. Natl. Acad. Sci. USA 77 (1980), 3567; O'Hare, Proc. Natl. Acad. Sci. USA 78 
(1981), 1527), gpt, which confers resistance to mycophenolic acid (Mulligan, Proc. 
Natl. Acad. Sci. USA 78 (1981), 2072); neo, which confers resistance to the 
aminoglycoside G-418 (Colberre-Garapin, J. Mol. Biol. 150 (1981), 1); hygro, 
which confers resistance to hygromycin (Santerre, Gene 30 (1984), 147); or 
puromycin (pat, puromycin N-acetyl transferase). Additional selectable genes have 
been described, for example, trpB, which allows cells to utilize indole in place of 
tryptophan; hisD, which allows cells to utilize histinol in place of histidine (Hartman, 
Proc. Natl. Acad. Sci. USA 85 (1988), 8047); and ODC (ornithine decarboxylase) 
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which confers resistance to the ornithine decarboxylase inhibitor, 2- 
(difluoromethyl)-DL-ornithine, DFMO (McConlogue, 1987, In: Current 
Communications in Molecular Biology, Cold Spring Harbor Laboratory ed.). Cells 
to be used for ex vivo gene therapy are well known to those skilled in the art. For 
example, mature cells of the immune system present in blood or preferably the 
corresponding stem cells. 

In a preferred embodiment of the invention said protein in the pharmaceutical 
composition is a bacterial nucleoside or nucleotide kinase, preferably a bacterial 
TmpK. As described in the appended examples the TmpK of £. coli or its structure 
with respect to the P-loop and the LID region appear to be particularly suited for 
the activation of nucleoside and nucleotide analogs, e.g., AZT. Thus, in a preferred 
embodiment of the pharmaceutical composition of the invention said protein to be 
employed in accordance with the present invention has at least the P-loop and/or 
the LID region of E.coii TmpK, for example, said TmpK comprises the amino acid 
sequence of E. coli TmpK, e.g., that shown in SEQ ID NO: 4 or a biologically active 
fragment thereof. 

The present invention also relates to compositions comprising at least one of the 
aforementioned polynucleotides, vectors, polypeptides, proteins or antibodies, and 
in the case of kits or diagnostic compositions, optionally suitable means for 
detection. Said compositions may further contain compounds such as nucleoside 
or nucleotide analogs, further plasmids, antibiotics and the like for screening 
transgenic cells useful for the genetic engineering of non-human animals, 
preferably mammals and most preferably mouse. The diagnostic compositions of 
the invention may be used for methods of detecting and isolating nucleoside or 
nucleotide analogs which are functionally equivalent to, e.g., AZT or d4T. 

The present invention also relates to a method for the production of a transgenic 
non-human animal, preferably transgenic mouse, comprising introduction of a 
polynucleotide or vector of the invention into a germ cell, an embryonic cell, stem 
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cell or an egg or a cell derived therefrom. The non-human animal to be used in the 
method of the invention may be a non-transgenic healthy animal, or may have a 
viral disease or cancer, preferably a disease caused by infection of a retrovirus 
such as HIV, HTLV or related viruses. Production of transgenic embryos and 
screening of those can be performed, e.g., as described by A. L. Joyner Ed., Gene 
Targeting, A Practical Approach (1993), Oxford University Press. The DNA of the 
embryonal membranes of embryos can be analyzed using Southern blots with an 
appropriate probe. 

The invention also relates to transgenic non-human animals such as transgenic 
non-human mouse, rats, hamsters, dogs, monkeys, rabbits or pigs comprising a 
polynucleotide or vector of the invention or obtained by the method described 
above, preferably wherein said polynucleotide or vector is stably integrated into the 
genome of said non-human animal, preferably such that the presence of said 
polynucleotide or vector leads to the expression of the polypeptide of the invention. 

With the polypeptides, polynucleotides and vectors of the invention, it is now 
possible to study in vivo and in vitro the efficiency of nucleoside and nucleotide 
analogs. Furthermore, since the polypeptides of the invention provide for optimal 
or at least improved activation of said nucleoside or nucleotide analog, it is now 
possible to determine further analogs which may be effective for the treatment of 
viral diseases or cancer, for example specific tumors or AIDS. 

The present invention further relates to a method for identifying an inhibitor of a 
nucleoside or nucleotide kinase comprising the steps of: 

(a) contacting the polypeptide of the invention or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a compound to be 
screened under conditions that permit binding of said compound to the 
nucleoside or nucleotide kinase, and 
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(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the absence or decrease of the signal is 
indicative for an inhibitor of a nucleoside or nucleotide kinase. 

Furthermore, the invention relates to a method for identifying a nucleoside or 
nucleotide based prodrug comprising the steps of 

(a) contacting the polypeptide of the invention or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a nucleoside or 
nucleotide analog compound to be screened under conditions that permit 
kinase activity of said polypeptide, and 

(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the presence of a signal is indicative for 
a putative prodrug. 

The term " compound" in a method of the invention includes a single substance or 
a plurality of substances which may or may not be identical. 

Said compound(s) may be comprised in, for example, samples, e.g., cell extracts 
from, e.g., plants, animals or microorganisms. Furthermore, said compounds may 
be known in the art but hitherto not known to be capable of inhibiting a nucleoside 
or nucleotide kinase or not known to be useful as a prodrug, respectively. The 
plurality of compounds may be, e.g., added to the culture medium or injected into a 
cell or non-human animal of the invention. 

If a sample containing (a) compound(s) is identified in the method of the invention, 
then it is either possible to isolate the compound from the original sample identified 
as containing the compound, in question or one can further subdivide the original 
sample, for example, if it consists of a plurality of different compounds, so as to 
reduce the number of different substances per sample and repeat the method with 
the subdivisions of the original sample. It can then be determined whether said 
sample or compound displays the desired properties, for example, by the methods 
described herein, the appended examples or in the literature, e.g., Guettari (1997). 
Depending on the complexity of the samples, the steps described above can be 
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performed several times, preferably until the sample identified according to the 
method of the invention only comprises a limited number of or only one 
substance(s). Preferably said sample comprises substances of similar chemical 
and/or physical properties, and most preferably said substances are identical. The 
methods of the present invention can be easily performed and designed by the 
person skilled in the art, for example in accordance with other cell based assays 
described in the prior art or by using and modifying the methods as described in 
the appended examples. Furthermore, the person skilled in the art will readily 
recognize which further compounds and/or enzymes may be used in order to 
perform the methods of the invention, for example, enzymes, if necessary, that 
convert a certain compound into the precursor which in turn represents a substrate 
for the kinase of the invention. Such adaptation of the method of the invention is 
well within the skill of the person skilled in the art and can be performed without 
undue experimentation. 

Compounds which can be used in accordance with the present invention include 
peptides, proteins, nucleic acids, antibodies, small organic compounds, ligands, 
peptidomimetics, PNAs and the like. Said compounds can also be functional 
derivatives or analogues of known nucleoside or nucleotide analogs. Methods for 
the preparation of chemical derivatives and analogues are well known to those 
skilled in the art and are described in, for example, Beilstein, Handbook of Organic 
Chemistry, Springer edition New York Inc., 175 Fifth Avenue, New York, N.Y. 
10010 U.S.A. and Organic Synthesis, Wiley, New York, USA. Furthermore, said 
derivatives and analogues can be tested for their effects according to methods 
known in the art or as described, for example, in the appended exampl s. 
Furthermore, peptide mimetics and/or computer aided design of appropriate 
nucleoside and nucleotide derivatives and analogues can be used, for example, 
according to the methods described below. Nucleoside and nucleotide analogs 
comprise molecules having as the basis structure a ribo-, deoxyribo- or 
dideoxyribonucleoside. 
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Appropriate computer programs can be used for the identification of interactive 
sites of a putative inhibitor and the polypeptide of the invention by computer 
assistant searches for complementary structural motifs (Fassina, Immunomethods 
5 (1994), 114-120). Further appropriate computer systems for the computer aided 
design of protein and peptides are described in the prior art, for example, in Berry, 
Biochem. Soc. Trans. 22 (1994), 1033-1036; Wodak, Ann. N. Y. Acad. Sci. 501 
(1987), 1-13; Pabo, Biochemistry 25 (1986), 5987-5991. The results obtained from 
the above-described computer analysis can be used in combination with the 
method of the invention for, e.g., optimizing known nucleoside or nucleotide 
analogs. Appropriate peptide mimetics and other inhibitors can also be identified 
by the synthesis of peptide mimetic combinatorial libraries through successive 
chemical modification and testing the resulting compounds, e.g., according to the 
methods described herein and in the appended examples. Methods for the 
generation and use of peptide mimetic combinatorial libraries are described in the 
prior art, for example in Ostresh, Methods in Enzymology 267 (1996), 220-234 and 
Domer, Bioorg. Med. Chem. 4 (1996), 709-715. Furthermore, the three- 
dimensional and/or crystallographic structure of inhibitors of the polypeptide of the 
invention can be used for the design of peptide mimetic inhibitors or nucleoside 
and nucleotide analogs, e.g., in combination with the polypeptide of the invention 
(Rose, Biochemistry 35 (1996), 12933-12944; Rutenber, Bioorg. Med. Chem. 4 
(1996), 1545-1558). 

The compounds identified according to the method of the invention, in particular 
nucleoside and nucleotide analogs are expected to be very beneficial since the 
prodrugs that have been used so far are only of limited use due to their inefficient 
metabolism in the subject. 

o 

In summary, the present invention provides methods for identifying compounds 
which inhibit kinase activity as well as compounds that can be used as nucleoside 
or nucleotide based prodrugs even in the absence of a polypeptide or protein 
having improved kinase activity for nucleoside and nucleotide analogs. 
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Compounds found to downregulate the activity of kinases may be used in the 
treatment of cancer and related diseases. In addition, it may also be possible to 
specifically inhibit viral kinases, thereby preventing viral infection or viral spread. 
The compounds identified or obtained according to the method of the present 
invention are thus expected to be very useful in diagnostic and in particular for 
therapeutic applications. Hence, in a further embodiment the invention relates to a 
method for the production of a pharmaceutical composition comprising the steps of 

(a) contacting the polypeptide of the invention or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a compound to be 
screened under conditions that permit binding of said compound to the 
nucleoside or nucleotide kinase, and 

(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the absence or decrease of the signal is 
indicative for an inhibitor of a nucleoside or nucleotide kinase, or 

(a') contacting the polypeptide of the invention or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a nucleoside or 
nucleotide analog compound to be screened under conditions that permit 
kinase activity of said polypeptide, and 

(b') detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the presence of a signal is indicative for 
a putative prodrug; and 

(c) formulating the inhibitor identified in step (b) or the nucleoside or nucleotide 
analog identified in step (b') in a pharmaceutical^ acceptable form. 

The therapeutically useful compounds identified according to the method of the 
invention may be administered to a patient by any appropriate method for the 
particular compound, e.g., orally, intravenously, parenterally, transdermal^, 
transmucosally, or by surgery or implantation (e.g., with the compound being in the 
form of a solid or semi-solid biologically compatible and resorbable matrix) at or 
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near the site where the effect of the compound is desired. Therapeutic doses are 
determined to be appropriate by one skilled in the art, see supra. 

Furthermore, the present invention relate to the use of 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity for a 
nucleoside or nucleotide analog or a polynucleotide encoding and capable 
of expressing said protein in vivo or a vector containing said polynucleotide, 

(b) the polypeptide, the polynucleotide or the vector of the invention; and/or 

(c) the nucleoside or nucleotide analog identified in the method of the invention 
for the preparation of a pharmaceutical composition for the activation of nucleoside 
or nucleotide analogs or nucleoside or nucleotide based prodrugs and/or for the 
treatment of viral infections and/or diseases or cancer. Preferably said activation 
results in a cytotoxic nucleoside or nucleotide. In a preferred embodiment of the 

* use of the invention said viral infection is HIV infection. 

In a further embodiment the present invention relates to the use of the inhibitor 
obtainable by the method of the invention for the preparation of a pharmaceutical 
composition for inhibiting virus replication or for treating cancer. As mentioned 
above, the inhibitor identified and obtainable by the method of the invention may 
be used to specifically inhibit the activity of a viral kinase. This would mean that the 
activation of nucleosides and/or nucleotides necessary for viral replication is 
suppressed and therefore may result in preventing of the production of viral 
progeny. 

In a preferred embodiment of the invention, the pharmaceutical compositions, e.g., 
to be used as described above further comprise or are designed to be 
administered with a nucleoside or nucleotide analog, preferably AZT or d4T. 

In a still further embodiment, the present invention relates to a method for the 
preparative synthesis of a nucleoside phosphate analog comprising: 
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(a) using a polynucleotide of the invention or as defined above in a noncellular 
system or in a cell ex vivo, and 

(b) formulating the cells modified in step (a) in a pharmaceutical^ acceptable 
form. 

In another embodiment the present invention relates to the use of 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity for a 
nucleoside or nucleotide analog or a polynucleotide encoding and capable 
of expressing said protein in vivo or a vector containing said polynucleotide, 
or 

(b) the polypeptide, the polynucleotide or the vector of invention 

for the preparation of nucleoside phosphates or analogs and derivatives thereof. 
Prior to the present inventions it was cost extensive or even impossible to produce 
analogs of nucleosides or nucleotides useful in diagnostic and therapeutic 
approaches; examples are phosphorylated forms of AZT and d4T. With the help of 
the present invention it is now possible to prepare such nucleoside and nucleotide 
analogs, for example, p or y-phosphate labeled nucleotides. For instance, the 
polypeptide of the invention is particularly suited for use in the preparation of 
nucleoside or nucleotide analogs in which it can be utilized in liquid phase or 
bound to a solid phase carrier. The polypeptide of the invention can be bound to 
many different carriers and used to produce nucleoside and nucleotide analogs. 
Examples of well-known carriers include glass, polystyrene, polyvinyl chloride, 
polypropylene, polyethylene, polycarbonate, dextran, nylon, amyloses, natural and 
modified celluloses, polyacrylamides, agaroses, and magnetite. The nature of the 
carrier can be either soluble or insoluble for purposes of the invention. Those 
skilled in the art will know of other suitable carriers for binding the polypeptide or 
will be able to ascertain such, using routine experimentation. There are many 
different labels known to those of ordinary skill in the art. Examples of the types of 
labels which can be used in the present invention include radioisotopes, colloidal 
metals, fluorescent compounds, chemiluminescent compounds, and 
bioluminescent compounds. On the other hand, a cell of the invention may be 
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used in a fermentation process for the production of such nucleoside or nucleotide 
analogs. 

These and other embodiments are disclosed and encompassed by the description 
and examples of the present invention. Further literature concerning any one of the 
methods, uses and compounds to be employed in accordance with the present 
invention may be retrieved from public libraries and databases, using for example 
electronic devices. For example the public database "Medline" may be utilized 
which is available on the Internet, for example under 
http://www.ncbi.nlm.nih.gov/PubMed/medline.html. Further databases and 
addresses, such as http://www.ncbi.nlm.nih.gov/, http://www.infobiogen.fr/, 
http://www.fmi.ch/biology/research_tools.html, http://www.tigr.org/, are known to 
the person skilled in the art and can also be obtained using, e.g., 
http://www.lycos.com. An overview of patent information in biotechnology and a 
survey of relevant sources of patent information useful for retrospective searching 
and for current awareness is given in Berks, TIBTECH 12 (1994), 352-364. 

The pharmaceutical compositions, uses, methods of the invention can be used 
advantageously for the treatment of all kinds of diseases hitherto unknown as 
being related to or dependent on viral diseases or cancer. The pharmaceutical 
compositions, methods and uses of the present invention may be desirably 
employed in humans, although animal treatment is also encompassed by the 
methods and uses described herein. 



The figures show: 

Figure 1 The TP5A-TmpK complex structure, a. A ribbon diagram of a 
monomer (TmpK is a homodimer) with the helices drawn in red and 
strands in green. The TP 5 A bound at the active site is in cyan with 
the 5 phosphorous atoms in purple. The LID region adopts a helical 
conformation in the TP 5 A bound complex in contrast to a coil with no 
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secondary structure when only dTMP present, b. Stereo view of the 
TP 5 A-TmpK crystal packing. The monomers can be designated active 
(green) or inactive (red) according to the presence of a symmetry 
related arginine (depicted as ball-and-stick) in the active site. The 
TP S A molecules are shown in cyan. Note that the arginines in green 
penetrate the active site of the red monomers (unlike the red-colored 
arginines that are further away) and thus hinder the formation of an 
active conformation by those monomers. 

Figure 2 a. Stereo view of a simulated-annealing omit map of TP.5A and 
arginine 15 from an active monomer contoured at 3 sigma. Arginine 
1 5, which is located in the P-loop (shown as coil running left-to-right) 
interacts with the middle phosphate of TP 5 A. b. Stereo view of the 
active site. The TP 5 A nucleotide is drawn in yellow except for the 5 
phosphorous atoms that are in purple, c. A distance map with the 
same view as above. P-loop residues are additionally marked with a 
asterisk. 

Figure 3 Steady-state kinetics of dTMP versus AZT-MP with E. coli TmpK. 

a. The catalytic activity of E. coli TmpK (16.4 nM) with dTMP and 
ATP was measured as described in Methods at dTMP concentrations 
indicated at the X-axis, and the following ATP concentrations 15 uM 
(O). 25 uM (•), 50 uM (□), 100 uM (■), 300 uM ( A ) and 1000 
uM (A). The maximal catalytic activity attained was kcat = 15 s-1 
with KM dTMP of 2.7 uM with ATP and dTMP binding with a factor for 
synergism of 6. 

b. The catalytic activity of E. coli TmpK (15.1 nM) with AZT-MP and 
ATP. The ATP concentrations were as indicated in a. The turnover 
number (kcat) reached a maximum of 6 s" 1 at saturating 
concentrations of both substrates. The binding of dTMP and ATP is 



WO 99/41404 



PCT/EP99/00945 



33 

synergistic by a factor of 6. The apparent KM of dTMP is lowered 
from 17 uM (extrapolated to no ATP) to 2.7 uM at saturating ATP 
concentration, and likewise the apparent KM value of ATP is lowered 
from approximately 50 uM to 8 uM. This synergism of binding is 
completely absent with ATP and AZT-MP, the KM value of ATP being 
around 50 uM independent of the AZT-MP concentration and that of 
AZT-MP (30 uM) being very close to dTMP (17 uM) in the absence of 
ATP. Therefore, it is predicted that the 3'-OH group of dTMP may be 
directly involved in the synergism of ATP and dTMP binding. 

Figure 4 Sequence alignment (using GCG) of 3 eukaryotic (KTHY_SCHPO is 
TmpK from Schizosaccharomyces pombe) and the E. coli thymidyiate 
kinase amino-acid sequences. Shaded black are residues conserved 
in all sequences, in gray are similar amino-acids found in at least 3 of 
the sequences. The secondary structural elements (helices as tubes, 
strands as arrows) of the yeast TmpK are also shown. Residues that 
contact TP 5 A directly are marked with an arrow. The LID part of the 
sequence is marked as a dotted line; note the difference in the P-loop 
region between the eukaryotic TmpKs and the E. coli enzyme 
(between p1 and a1), and the additional basic residues found only in 
E. coli TmpK. 

Figure 5 Comparison of yeast thymidyiate kinase (thick line) with HSV1 
thymidine kinase (thin line) (pdb code 1kin). a. Superposition of both 
kinases (both are homodimers; only a monomer is shown) with TmpK 
in blue and HSV1-TK in yellow. The TP 5 A bound to TmpK is also 
shown (cyan) as is the thymidine bound to HSV1-TK (orange), b. 
Stereo view (rotated 180° relative to the figure above) with a close-up 
of the active sites. Thymidine (orange) is displaced relative to the 
thymidine moiety of TPgA (cyan). As a result, where in TmpK Tyr102 
discriminates against ribonucleotides, the corresponding Tyr172 in 
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HSV1-TK makes a base-stacking interaction with the thymidine base, 
a role accomplished by Phe69 in TmpK. 

Abbreviations: 

TmpK, thymidylate kinase 
UmpK, uridylate kinase 
AK, adenylate kinase 

HSV1-TK, herpes simplex virus thymidine kinase, type I 
AZT, 3'-deoxy-3'-azido thymidine 
AZT-MP, 3'-deoxy-3'-azido thymidine monophosphate 
AZT-DP, 3'-deoxy-3'-azido thymidine diphosphate 
AZT-TP, 3'-deoxy-3'-azido thymidine triphosphate 
TP 5 A, P1-(5'-adenosyl) P5-(5'-thymidyl) pentaphosphate 

UPgA, P1-(5'-adenosyl) P5-(5'-uridyl) pentaphosphate 

PA, PB, PC, PD, PE, phosphate groups of TPgA 

NMP, nucleoside monophosphate kinases 

ncs, non-crystallographic symmetry 

RMSD, root mean square deviation 

The examples illustrate the invention. 

Thymidylate kinase (E.C. 2.7.4.9; ATPidTMP phosphotransferase) catalyzes the 
phosphorylation of thymidine monophosphate (dTMP) to thymidine diphosphate 
(dTDP) utilizing ATP as its preferred phosphoryl donor (Jong & Campbell, 1984) 
according to the scheme: 

dTMP+ ATP»Mg <^> dTDP + ADP Mg 



Its location at the junction of the de-novo and salvage pathways for thymidine 
triphosphate (dTTP) synthesis makes thymidylate kinase (TmpK) an essential 
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enzyme for cell proliferation, and thus an attractive target for the development of 
drugs against cancer. In addition to its physiological role, TmpK is also involved in 
the activation of the AIDS drug 3'-azido-3'-deoxythymidine (AZT). AZT is a prodrug 
that must be phosphorylated three times to its triphosphate form (AZT-TP), since it 
is AZT-TP that inhibits viral replication by DNA chain termination. TmpK, which 
catalyzes the second phosphorylation step, from the monophosphate (AZT-MP) to 
the diphosphate (AZT-DP), has been shown to be the rate limiting enzyme in the 
AZT activation pathway (Furman et al., 1986, Qian et al., 1994). This results in a 
toxic accumulation of AZT-MP to millimolar concentration in cells exposed to AZT 
(Bridges et al., 1993, Tomevik et al, 1995, Yan et al., 1995), and in a low 
concentration of the active compound AZT-TP. Slow activation rates of prodrugs 
have been implicated in allowing the replicating virus to select for resistant 
mutants. 

The Examples of the present invention provide the understanding the mechanism 
of phosphoryl transfer and pinpoint the amino acid residues involved and thus 
provide for a generally applicable method for developing and improving strategies 
for the treatment of cancer and AIDS. For example, the design of a mechanism- 
based inhibitor of TmpK resulting in the halt of dTTP synthesis, and thus cell 
proliferation, could play a role in chemotherapy of cancers. 

Example 1 : Structure of TmpK from yeast 

TmpK from Saccharomyces cerevisiae (cdc8 gene), has 216 amino acid residues 
and a molecular weight of 25 kD Crystals of the complex between the bisubstrate 
inhibitor TP5A with yeast thymidylate kinase were obtained by the hanging drop 
method. A 12 mg/mL enzyme solution was premixed with a TP5A solution to a final 
concentration of 10 mg/mL enzyme and 2 mM TP5A. Equal volumes of the protein- 
nucleotide solution and a solution composed of 20% PEG 2000 monomethyl ether, 
100 mM sodium acetate pH 4.6, and 200 mM ammonium sulfate were mixed, and 
left to equilibrate at room temperature against a reservoir composed of the latter 
solution. Crystals with typical dimensions (urn) of 400 x 200 * 100 grew within 
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days in space groups P1, P2i, or P2-|2-|2 with very similar unit cell dimensions. 
This space group polymorphism made these crystals unsuitable for the initial 
structure determination (Lavie et al., 1997b). The structure reported here was 
solved by molecular replacement using the dTMP-TmpK complex structure as 
starting model (see below). A data set was collected at 100 K from a crystal 
soaked shortly in mother liquor with 25% glycerol as cryo-protectant, using a 
Siemens multiwire area detector mounted on a Mac Science rotating anode 
operating at 45kV, 100 mA. Processing of the data was carried out with XDS 
(Kabsch, 1993). The structure reported here is from a crystal which reduced in 
space group P2i and diffracted to 1.8 A resolution, see Table 1. 

Table 1 : Data Collection and Refinement Statistics 



Data collection 



Temperature (K) 100 

Resolution range (A) 43.8 - 1.8 

Observed reflections 270178 

Unique reflections 116451 

Completeness (%, overall/last shell) 82.0/56.9 

R sym a (%, overall/last shell) 5.9/26.6 

Space group P2i 
Unit-cell (A, °) a=72.6 b=87.3 c= 1 55.0 (3=90. 1 

molecules / asymmetric unit 8 



Refinement Statistics 

Resolution range (A) 

Rfactor b /Rfree (%) 
RMS deviations 

bond lengths (A) 

bond angles (A) 

dihedral angles (°) 

improper angles(°) 

Reflections with F>0 s (working / test) 101931 / 5401 



43.8 - 2.0 
20.9/27.9 

0.012 

1.64 

24.7 
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# of protein atoms 
nucleotide atoms 
water molecules 

Average B (A 2 ) 
main chain 
side chain 
waters 
nucleotides 



a Rsym=_|l-<l>|/J 
b Rfactor=J|Fobsl - Fcalcll/JFobsl 5% of reflections were used for 

Rfree- . 

The TP5A complex structure (Figure 1 ) was solved by the molecular repiacem nt 
method with the partially refined dimeric dTMP complex structure as the starting 
search model. As mentioned above, the TP5A co-crystals exhibit space group 
polymorphism; the data set used has P21 symmetry with 4 dimers in the 
asymmetric unit. The pseudo-symmetry with the orthorhombic space group 
suggested the presence of two tetramers in the monoclinic asymmetric unit. This 
implied that when searching with the TmpK dimer, one should expect 2 rotation 
function solutions, where each solution has two different translation function 
solutions, thus yielding the two tetramers. Both AMoRe (Navaza, 1994) and X- 
PLOR (Brunger, 1993) yielded the two expected rotation functions solutions, but it 
was not possible to find the two appropriate translation function solutions. The self- 
Patterson map showed one clear peak that was interpreted as the vector relating 
the two tetramers. Thus, each of the two rotation function results were first applied 
on the search dimer, and then the translation vector from the self-Patterson was 
applied on the rotated dimers. The resulting two pairs of tetramers were used 
independently as search models for the translation function and yielded clear 
solutions. In P2i the origin of the unique axis is not defined so the relative position 
of one tetramer to the other was found by arbitrarily keeping one tetramer at y=0, 
which also limits the translational search to two dimensions (x and z). At first, the 
refinement was carried out using strict non-crystallographic symmetry (ncs) thus 
allowing to work with a dimer instead of the octamer. The refinement proceeded 



13791 

440 

1198 

20.5 
22.1 
28.6 
16.3 
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smoothly with clear electron density for the TP5A bisubstrate inhibitor. When the 
R-factor reached 27% the ncs constraints were totally relaxed. The R-factor 
converged at 20% and 29% for the work and test set (5% of total reflections), 
respectively. Ncs restraints were reapplied at relatively low weight causing a slight 
decrease in Rfree and a bigger increase in Rwork- Consequently, the final model 
was refined with ncs-restraints grouping monomers 1 , 3, 5, and 7 in one group and 
2, 4, 6, and 8 in the other (the LID sequence and nucleotides were not included). 
The numbering used is residues 1 to 216 and 501 to 716 for monomer 1 and 2, 
respectively, with TP5A numbered 217 in monomer 1 and 717 in monomer 2. 
Monomers 3 and 4 are numbered as above with the addition of 1000, 5 and 6 with 
the addition of 2000, and 7 and 8 with the addition of 3000 (thus TP5A 3217 is the 
TP5A bound to monomer 7). Model building was done with the program O (Jones 
et al., 1991). The present model consists of 8 monomers with some residues 
omitted because of poor electron density ( residues 1, 2, 1137 to 1148, 2001, 
3001, 3501, 3502) and some residues modeled as alanines, 8 TP5A molecules, 
and 1198 water molecules (see Table 1). A simulated-annealing omit map 
(calculated without the TP5A and Arg15) with the current TP5A model 
superimposed is shown in Figure 2a. 

Unlike the known structures of other nucleoside monophosphate (NMP) kinases 
(adenylate kinase, uridylate kinase, guanylate kinase) which are all monomeric, 
TmpK is a homodimer. Despite no amino acid sequence similarity between TmpK 
and any of the other NMP kinases, TmpK assumes the common fold of the other 
NMP kinases with 5 parallel beta strands forming a beta-sheet core surrounded by 
helices (Figure 1a). The highly hydrophobic dimer interface is composed of 3 
parallel a-helices provided by each monomer that stack against each other. 
The asymmetric unit of the P2i crystal form consists of eight monomers which 
pack in 4 dimers. A homodimer is the basic unit of yeast TmpK, but no 
cooperativity was detected in our kinetic assays (Lavie et al., 1997a). Upon 
overlaying all 8 monomers, it was evident that they can be divided into two groups; 
mon1, mon3, mon5 and mon7 adopt one conformation, whereas mon2, mon4, 
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mon6, and mon8 adopt another conformation (as neither ncs constraints nor 
restraints where used in the last stages of refinement, the conformation adopted 
by each monomer is totally independent of the other monomers; in the final cycle 
of refinement ncs-restraints were reapplied). For reasons that become apparent 
later, the first group was designated as inactive monomers, and the second group 
of monomers as active (Figure 1b). The RMSD between the active and inactive 
monomers excluding the LID sequence is 0.3 A (on Ca atoms). 
Most likely it is the crystal packing which determines which monomers adopt the 
active conformation, and which do not. Apparently, an active monomer of one 
dimer inactivates a monomer of another dimer. Arg173 (a non-conserved residue) 
from all active monomers interacts with the TP5A bound to the inactive monomers 
through a oxygen atom of phosphate PB, whereas Arg173 from the inactive 
monomers cannot form such an interaction. The presence of Arg173 prevents the 
formation of an active conformation of those monomers. Thus, it is this interaction 
of Arg173 from an active monomer with the TP5A of an inactive monomer which 
presumably prevents the closure of the LID of those monomers and results in poor 
density. Since TmpK is a dimer in solution (based on gel-filtration and dynamic 
light scattering experiments) and the fact that binding studies with TP5A clearly 
indicate a single class of binding sites in solution, it is assumed that the partitioning 
into active and inactive monomers is a pure crystallization artifact with no 
physiological significance. Therefore, the discussion is limited to the active 
conformers. 

Example 2: Comparison with dTMP-TmpK complex structure 

Kinases undergo large conformational changes upon the binding of substrates 
(Vonrhein et al., 1995). At least 3 different states have been described; an open 
state in the absence of substrates, a partially closed state when one of the 
substrates is bound, and a closed state when both substrates are present. The 
previously reported dTMP-TmpK complex (Lavie et al.. 1997b) represents the 
partially closed state (ATP is missing) whereas the TP5A complex represents the 
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fully closed state, which is catalyticaily competent. Superposition of these two 
complexes shows that the main difference between them lies in the LID region (the 
other slight differences concentrate in loop sequences, which are expected to vary 
as a result of the different crystal packing between the two complexes). In NMP 
kinases the LID region has been observed to change its conformation upon ATP 
binding. In the dTMP-complex the LID sequence was only traceable in one of the 
monomers, and the density for the traced sequence was weak, indicating disorder, 
as expected in the absence of ATP. With TP5A it was expected to see a much 
more ordered LID region. In the monomers that were designated as the inactive 
conformer, the LID region is very hard to trace and lacks secondary structure. In 
contrast, the LID region of the active conformers was easier to trace and forms a 
helix (Figure 1a). Upon superposition of TmpK with uridylate kinase (see more 
later for a detailed comparison with other NMP kinases), this newly formed helix 
overlays very well with a helix from the UmpK LID region (Scheffzek et al. v 1996). 

Example 3: Binding of TP5A 

Titration of TmpK to a cuvette containing 0.09 mM of the fluorescent bisubstrate 
inhibitor TP5A-MANT results in a fluorescence increase that indicates complex 
formation. Equilibrium fluorescence measurements were performed as described 
(Reinstein et al., 1990) using a SLM 8100 spectrofluorimeter with an excitation 
wavelength of 360 nm and emission wavelength of 440 nm. The experiments were 
done at pH 7.5 in a solution containing 50 mM Tris/HCI, 5 mM MgCl2, 2 mM 
EDTA, and 100 mM KCI with the temperature of the cuvette being held constant at 
25 °C. The fluorescent N-methylanthraniioyl group (MANT) joined to the ribose of 
the adenosine moiety of the bisubstrate inhibitor TP5A (TP5A-MANT) was used as 
probe (Reinstein et al., 1990). 

Time resolved binding studies were performed with the same buffer as described 
above with a Hi-Tech Scientific SF61 stopped flow apparatus. As a signal for 
complex formation either the extrinsic signal of TP5A-MANT with excitation at 360 
nm and a cutoff at 420 nm or the intrinsic tryptophan signal with excitation at 295 
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nm and a cutoff at 320 nm was used. These experiments were performed 
essentially as described (Packschies et al.. 1997). The titration data were fitted by 
a quadratic equation (Reinstein et al., 1990) which yields a Kd of 135 nM for the 
binding of TP5A-MANT to TmpK. Competitive displacement of TP5A-MANT from 
the so formed complex by TP5A and analysis by a cubic equation (Thrall et al., 
1996) shows the Kd for the binding of TP5A to TmpK to be 95 nM. The affinity of 
TP5A to TmpK appears to be rather weak compared to E. coli adenylate kinase 
that binds AP5A with a Kd of 15 nM or to D. discoideum UmpK that binds UP5A 
with a Kd of 3 nM (Reinstein et al, 1990, Wiesmuller et al., 1995). Time resolved 
measurements show that the binding of TP5A and TP5A-MANT is relatively fast 
but perhaps somewhat below the diffusion controlled limit with k 0 n = 10 * 
10 6 M~ 1 s" 1 and k 0 n = 5.5 * 10 6 M* 1 s~\ respectively. This indicates that the 
conformational changes induced by the bisubstrate inhibitor, e.g. closing of the LID 
region over substrate, do not constitute the rate limiting step of the catalytic cycle 
since k ca t of yeast TmpK was shown to be 35 s" (Lavie et al., 1997a) whereas 
the binding of TP5A did not deviate from a linear relationship for kobs versus 
UP5A ] up to 60 s 1 . 

In the crystal the bisubstrate inhibitor TP5A appears to be bound tightly by all 
monomers (see Figure 2). as can be inferred from the relatively low B-factors (see 
Table 1 ) and the excellent electron density for the nucleotides. Before describing in 
detail the interactions made between the TP5A di-nucleotide and the enzyme, it is 
important to realize that the TP5A molecule is not a transition-state analog but is 
rather a bisubstrate or a biproduct analog (Scheffzek et al., 1996). In other words, 
it is not clear whether the observed TP5A is closer to the ATP-dTMP or the ADP- 
dTDP state. 

The thymidine part of TP5A, which is completely surrounded by protein atoms, has 
a lower B-factor than the relatively exposed adenine part. The detailed interactions 
between the thymidine moiety and TmpK has been already described for the 
dTMP-TmpK complex (Lavie et al., 1997b), so only the differences in the TP5A 
complex will be outlined here. Lysine 37 is observed to interact with the phosphate 
of dTMP in the dTMP-TmpK complex structure. This phosphate (for phosphate 
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notation see figure 2c) corresponds to PE in TP5A. In the TP5A complex Lys37 
interacts with PD instead; PD can be seen either as the connector between dTMP 
(PE) and ATP (PA, PB, PC), and thus as an artificial moiety, or as the b phosphate 
of dTDP. In the other NMP kinase structures with such bisubstrate analogs, the 
position of this connecting phosphate varies (Abele & Schulz, 1995, Muller & 
Schulz, 1992, Scheffzek et al., 1996). Thus, it is hard to interpret this change in 
Lys37 interaction from PE to PD, but it is possible that this residue participates in 
stabilizing the transferred phosphate group in the transition state. 
The P-loop motif has been identified in many ATP- and GTP-binding proteins 
(Saraste et al., 1990). The binding of the nucleotide to the P-loop is through main- 
chain nitrogen atoms to the a and b phosphates of the nucleotide, and through a 
strictly conserved lysine. The ATP part of TP5A interacts in such a manner with the 
P-loop of TmpK, through the amides of Thr16 and Thr19 to the PB oxygen atoms, 
and Thr20 to PA oxygen atoms. In addition to the amide-phosphate interaction of 
Thr20, its side-chain also interacts with PA. The P-loop lysine (Lys18) is observed 
to interact with the PB and PC oxygen atoms of TP5A. It is positioned such that it 
may stabilize the transition state, in agreement with mutational analysis of 
adenylate kinase (Reinstein et al., 1988, Reinstein et al., 1990, Tsai & Yan, 1991). 
In the non-active monomers, Arg15 (situated at the tip of the P-loop) is in an 
extended conformation making no interaction with TP5A, while in the active 
monomers Arg15 bends to interact with a PC oxygen. It is in fact this interaction 
which lead the inventors dividing the monomers into active and inactive, since the 
inventors attribute an essential catalytic role to this arginine (see below). 
The adenine part of TP5A makes only two interactions with the enzyme, both 
through the amino group at C6. One is to the main-chain carbonyl group of 
Lys187, the other through a water molecule to the highly conserved Gln21. This 
would explain the preference of TmpK for adenine nucleotides over guanine; in 
guanine-based nucleotides there is a carbonyl group at C6 instead of the amino 
group, which would prevent the favorable interaction with the main-chain carbonyl 
of Lys187. However, the partial acceptance of non-adenine-based nucleotides as 
phosphoryl donors by TmpK (Jong & Campbell, 1984) is consistent with our 
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structure, as there are no strong interactions between the adenine base and the 
enzyme, and there would be no steric discrimination against the amino group at 
the C2 position of guanine, which would face towards the solvent. 
Bound magnesium was not observed in the structure, which is attributed to the low 
pH and the presence of ammonium sulfate in the crystallization conditions. 
Inferring from other NMP kinases (Scheffzek et al., 1996), the magnesium is 
expected to be octahedrally coordinated to the PB and PC oxygens 
(corresponding to b and g phosphates of ATP), Thr16, Asp93 (directly or through a 
water molecule), and two additional water molecules. At the position where 
magnesium is expected electron density was seen only for a water molecule 
bridging Asp93 and PC. 

Example 4: Comparison with other NMP kinases 

In addition to this TPsA-TmpK complex structure, there are at present 3 other 
structures of NMP kinases with bisubstrate analogs; adenylate kinase (yeast and 
E.coli) with AP5A (Abele & Schulz, 1995, Miiller & Schulz, 1992), and uridylate 
kinase (slime mold) with UP5A (Scheffzek et al., 1996). The above mentioned 
adenylate kinases (AK) are both of the long LID region type, while uridylate kinase 
is more like TmpK having a short LID region (for the LID's amino acid sequence 
see Figure 4). Therefore, the inventors have chosen to compare TmpK to uridylate 
kinase (UmpK). The structure of UmpK complexed with UP5A was superimposed 
on the TPsA-TmpK complex structure by aligning the P-loop sequences of both 
enzymes. While UmpK and AK have a high degree of sequence identity, none 
exists between TmpK and either UmpK or AK, but structurally, TmpK overlays on 
UmpK surprisingly well and UP5A occupies nearly the same position as TP5A. 
There is, however, a striking difference. The LID domain of UmpK interacts directly 
with the phosphates of UP5A via the basic residues Arg131 and Arg137, which 
have been shown to stabilize the transition state (Schlichting & Reinstein, 1997). 
While lacking such basic residues in the LID domain, TmpK has Arg15 in the P- 
loop, the corresponding amino acid in UmpK being a glycine. And in fact, in what 
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the inventors defined as the monomers in the active conformation, Arg15 makes a 
2.8 A hydrogen-bond interaction with one oxygen atom of PC. This would be an 
example of the importance of having catalytic residues in proximity of the reaction 
center, but their origin, LID or P-loop, is irrelevant for catalysis. 

Example 5: Comparison with Herpes Simplex Virus-1 thymidine kinase 

From sequence similarity the Herpes Simplex Virus-1 thymidine kinase (HSV1-TK) 
has been grouped with the thymidylate kinase family. In fact, HSV1-TK has both a 
thymidine and a thymidylate kinase activity, suggesting that it evolved from 
thymidylate kinase and acquired the additional thymidine kinase activity. The 3- 
dimensional X-ray structure of HSV1-TK has been solved recently revealing, as for 
TmpK, a dimer as the basic unit (Brown et aL, 1995). The viral kinase, however, 
contains 376 amino acids compared to TmpK's 216. It has additional N-terminal 
(45 residues) and C-terminal sequences (87 residues), as well as a few short 
inserts in the overlapping sequence region. Nevertheless, the overlapping 
sequence regions can be relatively easily overlayed structurally, with both kinases 
displaying the basic kinase fold as previously described (Figure 5). 
The low level of substrate specificity of the viral kinase, which is medicinally 
important in the activation of nucleoside prodrugs, can now be rationalized upon 
comparison with the highly specific TmpK. Using the 14 amino-acid stretch of the 
P-loop regions of both enzymes for the superposition (Ca rms of 0.7 A) it becomes 
apparent that the base moiety of thymidine bound to HSV1-TK is about 3.5 A 
displaced in relation to the position of the base in TmpK. The displacement is 
away from the core of the enzyme towards the solvent. This slippage of the 
nucleotide towards the active site residues (mainly basic residues which would 
interact with phosphate groups) but without the concomitant contraction of the 
nucleotide's base binding pocket explains why HSV1-TK can phosphorylate even 
purines in addition to pyrimidines, and how it can possess both thymidine and 
thymidylate kinase activity. Presumably, after the first phosphorylation of thymidine 
to thymidine monophosphate, the nucleotide has room to slip deeper into the 
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binding site, thus placing the phosphate at the position previously occupied by the 
C5 hydroxyl. Now the second phosphorylation can take place. Thus, it is the 
nucleotide that moves relative to the enzyme's catalytic residues. This would 
suggest that HSV1-TK, while accepting purine nucleosides as substrates, might 
not accept purine monophosphates (the substrates for the thymidylate kinase 
activity) as they must be bound deeper into the binding cavity and thus might not 
fit. And in fact, the guanine based anti-herpes drug acyclovir is phosphorylated to 
the monophosphate by HSV1-TK, but the following phosphorylation step is 
catalyzed by guanylate kinase (Elion, 1982), consistent with this interpretation. 
In contrast to the base, other residues in the active site overlay very well, 
noteworthy being Tyr102(TmpK) and Tyr172 (HSV1-TK) (within 0.3 A). Tyr102 is 
responsible for the discrimination against ribonucleotides in TmpK by being located 
3.5A from the C2 of the ribose. In HSV1-TK it fulfills a completely different role by 
base stacking with thymine (in TmpK Phe69 stacks analogously to this Tyr with the 
base but from the opposite side of the base; see Figure 5b). 
Important for the previous discussion of catalysis are the P-loop and LID regions; 
in this respect HSV1-TK is more similar to the E.coli TmpK than to the yeast TmpK 
in lacking the arginine in the P-loop (D14R15 in yeast, E12G13 and H58G59 in 
E.coli and HSV1-TK, respectively) and having basic residues in the LID region. 
Arg222 of HSV1-TK, which is in the LID region, is situated less than 4.5 A from the 
deoxyribose 05' of thymidine. Analogously to other NMP kinases that utilize basic 
residues from the LID region to stabilize the negative charge developing in the 
transition state, we propose that Arg222 of HSV1-TK fulfills this catalytic role. 

Example 6: Implications for catalysis and AZT activation 

In addition to phosphorylating dTMP to dTDP, thymidylate kinase is also part of the 
activation pathway of the anti-HIV prodrug 3^eoxy-3'-azidothymidine (AZT). In 
fact, TmpK is the rate-limiting enzyme in this activation pathway. Together with the 
previously reported nucleotide-complex structures with dTMP and with AZT-MP 
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(Lavie et al., 1997b), the TPsA-TmpK complex structure sugg sts which residues 
are needed for catalysis, and why AZT-MP is phosphorylated by TmpK so poorly. 
Whether the phosphoryl transfer mechanism is of an associative nature, a 
dissociative one, or of a mixed type, the negative charge developing during the 
transition state compared to the ground state has to be stabilized. This would be 
achieved by a basic residue (Arg or Lys), and its interaction with the phosphate 
must be made or strenghtened at the transition state. In UmpK it has been shown 
that the mechanism of phosphoryl transfer is most probably associative, and that 
the charge developing on the transferred phosphoryl group is stabilized mainly by 
Arg131 and Arg137 from the LID region, in addition to Lys19, Arg148 and the 
catalytic magnesium ion. In contrast, yeast TmpK lacks basic residues in its LID 
region, but does have an arginine residue at the tip of the P-loop. In the TmpK- 
TP5A complex this arginine's side-chain is at a similar position as would be an 
arginine originating from the LID domain, and is located 2.8 A from PC of TP5A. 
Therefore, the inventors attribute to Arg15 a similar role to that of the UmpK's LID 
arginines. 

As pointed out previously (Lavie et al., 1997a, Lavie et al., 1997b) t the binding of 
AZT-MP causes the P-loop of TmpK to shift by about 0.5 A due to the interaction 
between the azido moiety of AZT and the side-chain of Asp14. This shift affects 
Arg15 as well, causing it to be not optimally located to fulfill its catalytic role. Most, 
but not all thymidylate kinases sequenced to date have an arginine in the P-loop. A 
noticeable exception is thymidylate kinase from E. coli which has a glycine residue 
at that position (like UmpK). The LID domain of the E. coli TmpK contains basic 
residues, unlike yeast TmpK but like UmpK. The present inventors postulate that 
the E. coli TmpK functions very similarly to the slime mold UmpK, where arginine 
residues from the LID domain participate in catalysis (candidates are Arg147, 
Arg149, Arg151, and Arg156). This would suggest that a P-loop movement of E. 
coli TmpK, like that observed with the yeast enzyme and AZT-MP, would not have 
such a detrimental effect on catalysis, as the residues important for catalysis 
(except the conserved lysine) do not originate from the P-loop. To test this 
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hypothesis, thymidylate kinase from E. coli was cloned, expressed and purified 
and its dTMP and AZT-MP phosphorylation rates were compared. 

Example 7: Steady-state kinetics of £. coli TmpK 

As predicted based on the structural rational explained above, the rate of 
phosphorylation of AZT-MP and dTMP by E. coli TmpK is comparable (Figure 3). 
In steady-state kinetics assays the catalytic activity of TmpK was measured with a 
coupled colorimetric assay essentially as described (Reinstein et aL, 1988) with 
the following assay buffer: 100 mM Tris/HCI pH 7.5, 200 pM NADH, 400 pM 
phosphoenolpyruvate, 80 mM KCI, and 1.4 mM MgCl2 at 25 °C. In addition, the 
assay contained different concentrations of ATP and dTMP and 10 to 50 nM TmpK 
as indicated in Figure 3. The data were analyzed with the Michaelis-Menten 
equation and the non-linear regression program Grafit (Erithacus software). While 
in the case of yeast TmpK AZT-MP is phosphorylated 200 fold slower than dTMP, 
for the E. coli enzyme the factor is only 2.5 (Table 2), 



Table 2: Steady-state kinetic parameters 

yeast £. coli 



Km for dTMP with ATP 


9pM 


2.7 jjM 


Km for AZTMP with ATP 


6 \M 


30 (JM 


Km for ATP withdTMP 


190 mM 


8 pM 


Km for ATP with AZTMP 


300 pM 


50 pM 


kcat with ATP and dTMP 


35 s" 1 


15 s- 1 


k C at with ATP and AZTMP 


0.175 s" 1 


6s-1 


ratio kcat dTMP/AZTMP 


200 


2.5 


ratio kcat/KM dTMP/AZTMP 


133 


27.5 
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illustrating the big difference in acceptance of AZT-MP by these two dTMP-kinases 
from different organisms. It appears from this comparison that NMP kinases have 
developed different strategies to attain catalysis; one class has catalytic residues 
in the P-Loop and the LID regions (AK, UmpK, etc.) whereas most dTMP kinases 
(E. coli TmpK being an exception) apparently lack the catalytic arginine residues 
of the LID region. In this respect it has also to be considered that the mechanism 
of phosphoryl transfer of the yeast TmpK may not be purely associative as was 
deduced for UmpK (Schlichting & Reinstein, 1997). 

Example 8: Substitution of Argenine 15 with glycine in the P-loop of yeast 
TmpK and replacement of the yeast LID-region with the £. co/i LID-region 

In order to prove that the observations described in examples 1 to 7, supra, can be 
indeed applied to a nucleoside or nucleotide kinase naturally having only a low 
catalytic activity for nucleoside or nucleotide analogs, the yeast TmpK enzyme 
described above has been mutated at position X 3 in the consensus sequence of 
the P-loop (amino acid position 15 in the amino acid sequence of the yeast TmpK 
enzyme; see Figure 4) in order to substitute the amino acid argenine with glycine, 
the amino acid at the corresponding position in the P-loop of E. coli TmpK; see 
Figure 4. Furthermore, the LID sequence of the yeast TmpK enzyme from position 
131 to 150 (FLSTQ ... GDER) was replaced by the corresponding E. coli TmpK 
LID sequence from position 138 to 150 (YLDVTP ... ELDR); see Figure 4. The 
mutations referred to above were introduced into the yeast TmpK either alone or in 
combination by introducing the corresponding modifications in the DNA sequence 
underlying the amino acid sequence of the yeast TmpK as described in Sambrook 
et al., supra. The resultant enzymes were then tested and compared with respect 
to dTMP and AZT-MP phosphorylation rates as described in Example 7 and in 
Reinstein et al., (1988) with 10 mM MgCl 2 and the following, constant 
concentration of nucleotides: 2 mM ATP and either 1 mM TMP or 1mM AZT-MP. 
Furthermore, as a control the TmpK enzymes from yeast, E. coli, human, mouse 
and herpes simplex virus were tested. The results are shown in Table 3 below. 
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Table 3 



Protein 


Activity TMP 


Activity AZT-MP 


Reduction 


yeast 


100 


0.5 


200 


E. coli 


43 


21 


2 


yeast R G 


0.5 






yeast ->• LID E. coli 


4 


0 




yeast R -> G + 
-> LID E. coli 


25 


2 


12 


Human 


5 


0.07 


70 


Mouse 


20 


0.8 


25 


Herpes S. 


1 


0.5 


2 



As can be inferred from Table 3, the mutated version of the yeast TmpK enzyme 
that contains both, the mutated P-loop and LID-region exhibits improved kinase 
activity for the nucleotide analog AZT-MP which is 30-fold higher than that of the 
corresponding human enzyme. Moreover, the activity of the mutated yeast TmpK 
exceeds that from mouse and herpes simplex virus, the latter of which has been 
speculated to be useful in gene therapy. 

Example 9: improvement of kinase activity of human thymidylate kinase 

Modifications at two different positions of a nucleoside monophosphate kinase 
sequence are listed as the means of achieving a kinase with enhanced prodrug 
phosphorylation activity. Those positions are either the P-loop region, the LID 
region, or both. The human thymidylate kinase (hTMPK) has been modified 
accordingly with the result that higher azidothymidine monophosphate (AZTMP) 
phosphorylation activity was achieved. In addition, a modification outside of the P- 
loop or the LID region, namely that of phenylalanine 105 to tyrosine, also results in 
a hTMPK variant having the desired kinetic activity with AZTMP. The mutations in 
the amino acid sequence of hTMPK (SEQ ID NO: 5) and testing of the resultant 
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polypeptides were performed as described in Example 8. The results are 
summarized in Table 4. 



Table 4 



Protein 


Activity TMP 


Activity AZTMP 


ratio 

TMP/AZTMP 


yeast TMPK (WTt) 


100* 


0.5 


200 


A) human TMPK 


1.9 


0.019 


100 


(WT) 








B) Arg16Gly 


1.9 


0.64 


33 


Small E.coli 








LID§ 








C) Arg16Gly 


4.3 


6.1 


0.7 


Large E. coli 








LIDU 








D) Phe105Tyr 


0.475 


0.665 


0.7 



tWT: wild-type 

tThe rate of yeast TMPK with TMP is set at 1 00. 

§Small E. coli LID: exchange of hTMPK LID sequence with that of E. coli, residues 
145-148. 

ULarge £. coli LID: exchange of hTMPK LID sequence with that of E. coli, residues 
135-148. 

The modifications of hTMPK in the P-loop (Arg16Gly) and in the LID region (large 
E. coli LID) have resulted in a variant (marked (C) in the table above) with 320 fold 
higher activity with AZTMP as substrate in comparison to the wild type hTMPK. 
Moreover, the substrate specificity has been changed dramatically, such that 
variant (C) phosphorylates AZTMP at a faster rate than the physiological substrate 
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TMP. This should improve its efficacy ven further since TTP and AZTTP 
compete for incorporation into nascent viral DNA chains (e.g. TTP and AZTTP 
compete as substrates for reverse transcriptase). Variant (D), the result of a single 
amino-acid change, has likewise a faster rate with AZTMP than with TMP, but is 
about 10 times slower than variant (C). 

The results obtained in accordance with the present invention are of far reaching 
medicinal importance since it for the first time enables a rational approach to using 
gene therapy to improve the antiviral impact of nucleoside and nucleotide analogs, 
such as AZT as outlined previously (Balzarini et al M 1988, Guettari et al., 1997, 
Lavie et al., 1997a). 
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Claims 

1 . A method for the production of a polypeptide having or having enhanced 
kinase activity for a nucleoside or nucleotide analog, said method 
comprising substituting, adding or deleting at least one amino acid of a 
protein having nucleoside or nucleotide kinase activity at a position in the 
protein where: 

(a) the amino acid is at position X 2 and/or X 3 in the consensus sequence 
GX^XsX^K of the P-loop; 

(b) the amino acid is in the LID region; and/or 

(c) the amino acid is at position 105 in the amino acid sequence of 
human thymidylate kinase or at a corresponding position in a protein 
having nucleoside or nucleotide kinase activity. 

2. The method of claim 1, wherein said nucleoside is adenosine, cytidine, 
guanosine, thymidine or uridine or based on any of these. 

3. The method of claim 1 or 2, wherein said nucleotide is a nucleoside 
monophosphate. 

4. The method of claim 3, wherein said nucleoside monophosphate is 
thymidylate. 

5. The method of any one of claims 1 to 4, wherein said protein is derived from 
a eukaryotic or prokaryotic organism. 

6. The method of claim 5, wherein said organism is human or a yeast. 
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8. 



9. 



10. 



11. 



The method of any one of claims 1 to 6, wherein said protein comprises the 
amino acid sequence of any one of SEQ ID NOS: 1 to 13 or a fragment 
thereof. 

The method of any one of claims 1 to 7, wherein said amino acid which is 
substituted or added in (a) is glutamic acid, glycine or lysine and the amino 
acid substituted in (c) is tyrosine. 

The method of any one of claims 1 to 8, wherein said amino acid which is 
substituted or added in (b) is a basic amino acid, preferably arginine. 

The method of any one of claims 1 to 9 where the LID has the consensus 
sequence R/KXXXXXERYEXXXXQ. 

The method of any one of claims 1 to 10, wherein said polypeptide exhibits 
kinase activity for a nucleoside or nucleotide analog which is higher than 
that of the/a corresponding wild type eukaryotic, preferably human enzyme. 

The method of any one of claims 1 to 1 1 , wherein said nucleoside analog is 
AZT, d4T or has the following structure: 




and 



wherein B is any nucleobase or analog thereof, and X is O, CH 2 , NH or S. 
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13. The method of any one of claims 1 to 12, wherein the amino acid 
substitution(s) result in the P-loop and/or in the LID region of a bacterial 
nucleoside or nucleotide kinase, preferably those of the TmpK of E.coli. 

14. A polynucleotide encoding the polypeptide obtainable by the method of any 
one of claims 1 to 13. 

15. A vector containing the polynucleotide of claim 14. 

16. The vector of claim 15, wherein the polynucleotide is operatively linked to 
expression control sequences allowing expression in prokaryotic or 
eukaryotic cells. 

17. The vector of claim 15 or 16, which is a gene transfer or a gene targeting 
vector. 

18. A host cell genetically engineered with the vector of any one of claims 15 to 
17. 

19. A method for producing a polypeptide having nucleoside or nucleotide 
kinase activity for a nucleoside or nucleotide analog comprising 

(a) culturing the host cell of claim 18, and 

(b) recovering said polypeptide from the culture. 

20. A method for producing cells capable of expressing a polypeptide having 
nucleoside or nucleotide kinase activity for nucleoside or nucleotide analogs 
comprising genetically engineering cells with the polynucleotide of claim 14, 
or with the vector of any one of claims 15 to 17. 

21. A polypeptide having nucleoside or nucleotide kinase activity for a 
nucleoside or nucleotide analog encoded by polynucleotide of claim 14, 
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obtainable by the method of any one of claims 1 to 13 or 19 or from cells 
produced by the method of claim 20 or comprising a biologically active 
fragment of any of these. 

22. An antibody specifically recognizing the polypeptide of claim 21 . 

23. A composition comprising 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity 
for a nucleoside or nucleotide analog or a polynucleotide encoding 
and capable of expressing said protein in vivo or a vector containing 
said polynucleotide; or 

(b) the polypeptide of claim 21, the polynucleotide of claim 14 or the 
vector of any one of claims 1 5 to 17; 

(c) optionally a nucleoside or nucleotide analog; and 

(d) optionally a pharmaceutical^ acceptable carrier. 

24. The composition of claim 23, wherein said protein is a bacterial nucleoside 
or nucleotide kinase. 

25. The composition of claim 23 or 24, wherein said protein is a bacterial TmpK. 

26. The composition of any one of claims 23 to 25, wherein said protein has at 
least the P-loop and/or the LID region of E.coli TmpK. 

27. The composition of claim 25 or 26, wherein said TmpK comprises the amino 
acid sequence shown in SEQ ID NO: 4 or a biologically active fragment 
thereof. 

28. A kit comprising the polynucleotide of claim 14, the vector of any one of 
claims 15 to 17, the protein of claim 21 or the antibody of claim 22, and 
optionally a nucleoside or nucleotide analog. 
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29. A method for identifying an inhibitor of a nucleoside or nucleotide kinase 
comprising the steps of: 

(a) contacting the polypeptide of claim 21 or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a compound to 
be screened under conditions that permit binding of said compound 
to the nucleoside or nucleotide kinase, and 

(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the absence or decrease of the 
signal is indicative for an inhibitor of a nucleoside or nucleotide 
kinase. 

30. A method for identifying a nucleoside or nucleotide based prodrug 
comprising the steps of 

(a) contacting the polypeptide of claim 21 or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a nucleoside or 
nucleotide analog to be screened under conditions that permit kinase 
activity of said polypeptide, and 

(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the presence of a signal is 
indicative for a putative prodrug. 

31. A method for the production of a pharmaceutical composition comprising 
the steps of 

(a) contacting the polypeptide of claim 21 or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a compound to 
be screened under conditions that permit binding of said compound 
to the nucleoside or nucleotide kinase, and 
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(b) detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the absence or decrease of the 
signal is indicative for an inhibitor of a nucleoside or nucleotide 
kinase, or 

(a 1 ) contacting the polypeptide of claim 21 or a cell expressing said 
polypeptide in the presence of components capable of providing a 
detectable signal in response to kinase activity, with a nucleoside or 
nucleotide analog to be screened under conditions that permit kinase 
activity of said polypeptide, and 

(b') detecting presence or absence of a signal generated from the kinase 
activity of the polypeptide, wherein the presence of a signal is 
indicative for a putative prodrug; and 

(c) formulating the inhibitor identified in step (b) or the nucleoside or 
nucleotide analog identified in step (b') in a pharmaceutically 
acceptable form. 

32. Use of 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity 
for a nucleoside or nucleotide analog or a polynucleotide encoding 
and capable of expressing said protein in vivo or a vector containing 
said polynucleotide, 

(b) the polypeptide of claim 21, the polynucleotide of claim 14 or the 
vector of any one of claims 15 to 17; and/or 

(c) the nucleoside or nucleotide analog identified in the method of claim 
30 

for the preparation of a pharmaceutical composition for the activation of 
nucleoside or nucleotide analogs or nucleoside or nucleotide based 
prodrugs and/or for the treatment of viral infections and/or diseases or 
cancer. 
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33. The use of claim 32, wherein said activation results in a cytotoxic 
nucleoside or nucleotide. 

34. The use of claim 32 or 33, wherein said viral infection is HIV infection. 

35. Use of the inhibitor obtainable by the method of claim 31 or 32 for the 
preparation of a pharmaceutical composition for inhibiting virus replication or 
for treating cancer. 

36. A method for the preparative synthesis of a nucleoside phosphate analog 
comprising: 

(a) using a polynucleotide of claim 14 or as defined in any one of claim 
23 to 27 in a noncellular system or in a cell ex vivo, and 

(b) formulating the cells modified in step (a) in a pharmaceutical^ 
acceptable form. 

37. The composition of any one of claims 23 to 27 or the use of any one of 
claims 32 to 34, wherein said composition is a pharmaceutical composition 
and further comprises or is designed to be administered with a nucleoside 
or nucleotide analog, preferably AZT or d4T. 

38. Use of 

(a) a prokaryotic protein having nucleoside or nucleotide kinase activity 
for a nucleoside or nucleotide analog or a polynucleotide encoding 
and capable of expressing said protein in vivo or a vector containing 
said polynucleotide, 

(b) the polypeptide of claim 21, the polynucleotide of claim 14 or the 
vector of any one of claims 1 5 to 17 

for the preparation of nucleoside phosphates or analogs and derivatives 
thereof. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Max- Planck -Gesellschaf t zur Foerderung der 

Wissenschaf ten e.V. 

( B ) STREET : none 

(C) CITY: Berlin 

(D) STATE: none 

(E) COUNTRY: Germany 

(F) POSTAL CODE (ZIP) : none 

(ii) TITLE OF INVENTION: Novel means and methods for the preparation 
and activation of nucleoside and nucleotide based drugs 

(iii) NUMBER OF SEQUENCES: 15 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: . Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.3 0 (EPO) 



(2) INFORMATION FOR SEQ ID NO : 1: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1: 

Met Arg Gly lie Leu lie Thr lie Glu Gly He Asn Gly Val Gly Lys 
15 10 15 

Ser Thr Gin Ala Met Arg Leu Lys Lys Ala Leu Glu Cys Met Asp Tyr 
20 25 30 

Asn Ala Val Cys He Arg Phe Pro Asn Pro Asp Thr Thr Thr Gly Gly 
35 40 45 

Leu He Leu Gin Val Leu Asn Lys Met Thr Glu Met Ser Ser Glu Gin 
50 55 60 



Leu His Lys Leu Phe Thr Lys His His Ser Glu Phe Ser Ala Glu He 
65 70 75 80 
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Ala Ala Leu Leu Lys Leu Asn Phe lie Val lie Val Asp His Tyr lie 
85 90 95 

Trp Ser Gly Leu Ala Tyr Ala Gin Ala Asp Gly lie Thr lie Glu Thr 

100 105 110 

Lys Asn lie Phe Lys Pro Asp Tyr Thr Phe Phe Leu Ser Ser Lys Lys 
115 120 125 

Pro Leu Asn Glu Lys Pro Leu Thr Leu Gin Arg Leu Phe Glu Thr Lys 
130 135 140 

Glu Lys Gin Glu Thr lie Phe Thr Asn Phe Thr lie He Met Asn Asp 
145 150 155 160 

Val Pro Lys Asn Arg Leu Cys He He Pro Ala Thr Leu Asn Lys Glu 
165 170 175 

He He His Thr Met He Leu Thr Lys Thr He Lys Val Phe Asp Asn 
180 185 190 

Asn Ser Cys Leu Asn Tyr He Lys Met Tyr Asp Asp Lys Tyr Leu Asn 
195 200 205 

Val Gin Asp Leu Asn Leu Phe Asp Phe Asp Trp Gin Lys Cys He Glu 
210 215 220 

Asp Asn Asn Asp Lys Glu Glu Tyr Asp Asp Asp Asp Gly Phe He He 
225 230 235 240 

(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 

Met Ser Gly Leu Phe He Thr Phe Glu Gly Pro Glu Gly Ala Gly Lys 
15 10 15 

Thr Thr Val Leu Gin Glu He Lys Asn He Leu Thr Ala Glu Gly Leu 
20 25 30 

Gin Val Met Ala Thr Arg Glu Pro Gly Gly He Asp He Ala Glu Gin 
35 40 45 
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He Arg Glu Val 
50 

Thr Glu Ala Leu 
65 

Lys Val Lys Pro 



Phe He Asp Ser 
100 

He Asp Glu Val 
115 

Pro His Val Thr 
130 

Arg He Tyr Ala 
145 

Lys Leu Asp Phe 



Lys Arg Phe Pro 
180 

Asp Leu Val Val 
195 



He Leu Asn Glu 
55 

Leu Tyr Ala Ala 
70 

Ala Leu Glu Gin 
85 

Pro Leu Ala Tyr 



Leu Ser He Asn 
120 

Val Tyr Phe Ser 
135 

Asn Gly Ser Arg 
150 

His Thr Lys Val 
165 

Glu Arg Phe His 



Gin Asp Val Leu 
200 



Asn Asn He Leu 
60 

Ala Arg Arg Gin 
75 

Gly Phe He Val 
90 

Gin Gly Tyr Ala 
105 

Glu Phe Ala He 



lie Asp Pro Glu 
14 0 

Glu Lys Asn Arg 
155 

Gin Glu Gly Tyr 
170 

Ser Val Asp Ala 
185 

Lys Val He Asp 



Met Asp Pro Lys 



His Leu Val Glu 
80 

Leu Cys Asp Arg 
95 

Arg Gly Leu Gly 
110 

Gly Asp Met Met 
125 

Glu Gly Leu Lys 



Leu Asp Leu Glu 
160 

Gin Glu Leu Met 
175 

Gly Gin Ser Lys 
190 

Glu Ala Leu Lys 
205 



Lys He Gin Leu 
210 



(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 213 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 

Met Arg Ser Lys Tyr He Val He Glu Gly Leu Glu Gly Ala Gly Lys 
1 5 10 15 



Thr Thr Ala Arg Asn Val Val Val Glu Thr Leu Glu Gin Leu Gly He 
20 25 30 



WO 99/41404 



4/15 



PCT/EP99/00945 



Arg Asp Met Val Phe Thr Arg Glu Pro Gly Gly Thr Gin Leu Ala Glu 
35 40 45 

Lys Leu Arg Ser Leu Val Leu Asp lie Lys Ser Val Gly Asp Glu Val 
50 55 60 

lie Thr Asp Lys Ala Glu Val Leu Met Phe Tyr Ala Ala Arg Val Gin 
65 70 75 80 

Leu Val Glu Thr Val lie Lys Pro Ala Leu Ala Asn Gly Thr Trp Val 
85 90 95 

lie Gly Asp Arg His Asp Leu Ser Thr Gin Ala Tyr Gin Gly Gly Gly 
100 105 110 

Arg Gly lie Asp Gin His Met Leu Ala Thr Leu Arg Asp Ala Val Leu 
115 120 125 

Gly Asp Phe Arg Pro Asp Leu Thr Leu Tyr Leu Asp Val Thr Pro Glu 
130 135 140 

Val Gly Leu Lys Arg Ala Arg Ala Arg Gly Glu Leu Asp Arg He Glu 
145 150 155 160 

Gin Glu Ser Phe Asp Phe Phe Asn Arg Thr Arg Ala Arg Tyr Leu Glu 
165 170 175 

Leu Ala Ala Gin Asp Lys Ser He His Thr He Asp Ala Thr Gin Pro 
180 185 190 

Leu Glu Ala Val Met Asp Ala He Arg Thr Thr Val Thr His Trp Val 
195 200 205 

Lys Glu Leu Asp Ala 
210 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 



Met Lys Gly Lys Phe He Val He Glu Gly Leu Glu Gly Ala Gly Lys 
15 10 15 
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Ser Ser Ala His Gin Ser Val Val Arg Val Leu His Glu Leu Gly lie 
20 25 30 

Gin Asp Val Val Phe Thr Arg Glu Pro Gly Gly Thr Pro Leu Ala Glu 
35 40 45 

Lys Leu Arg His Leu lie Lys His Glu Thr Glu Glu Pro Val Thr Asp 



Lys Ala Glu Leu Leu Met Leu Tyr Ala Ala Arg lie Gin Leu Val Glu 
65 70 75 80 

Asn Val He Lys Pro Ala Leu Met Gin Gly Lys Trp Val Val Gly Asp 
85 90 95 

Arg His Asp Met Ser Ser Gin Ala Tyr Gin Gly Gly Gly Arg Gin Leu 
100 105 110 

Asp Pro His Phe Met Leu Thr Leu Lys Glu Thr Val Leu Gly Asn Phe 
115 120 125 

Glu Pro Asp Leu Thr He Tyr Leu Asp He Asp Pro Ser Val Gly Leu 
130 135 140 

Ala Arg Ala Arg Gly Arg Gly Glu Leu Asp Arg He Glu Gin Met Asp 
145 150 155 160 

Leu Asp Phe Phe His Arg Thr Arg Ala Arg Tyr Leu Glu Leu Val Lys 
165 170 175 

Asp Asn Pro Lys Ala Val Val He Asn Ala Glu Gin Ser He Glu Leu 
180 185 190 

Val Gin Ala Asp He Glu Ser Ala Val Lys Asn Trp Trp Lys Ser Asn 
195 200 205 

Glu Lys 



50 



55 



60 



210 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) 



SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 212 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) 



MOLECULE TYPE: protein 



(iii) 



HYPOTHETICAL: NO 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO : 5: 
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Met Ala Ala Arg Arg Gly Ala Leu lie Val Leu Glu Gly Val Asp Arg 
15 10 15 

Ala Gly Lys Ser Thr Gin Ser Arg Lys Leu Val Glu Ala Leu Cys Ala 
20 25 30 

Ala Gly His Arg Ala Glu Leu Leu Arg Phe Pro Glu Arg Ser Thr Glu 
35 40 45 

lie Gly Lys Leu Leu Ser Ser Tyr Leu Gin Lys Lys Ser Asp Val Glu 
50 55 60 

Asp His Ser Val His Leu Leu Phe Ser Ala Asn Arg Trp Glu Gin Val 
65 70 75 80 

Pro Leu lie Lys Glu Lys Leu Ser Gin Gly Val Thr Leu Val Val Asp 
85 90 95 

Arg Tyr Ala Phe Ser Gly Val Ala Phe Thr Gly Ala Lys Glu Asn Phe 
100 105 110 

Ser Leu Asp Trp Cys Lys Gin Pro Asp Val Gly Leu Pro Lys Pro Asp 
115 120 125 

Leu Val Leu Phe Leu Gin Leu Gin Leu Ala Asp Ala Ala Lys Arg Gly 
130 135 140 

Ala Phe Gly His Glu Arg Tyr Glu Asn Gly Ala Phe Gin Glu Arg Ala 
145 150 155 160 

Leu Arg Cys Phe His Gin Leu Met Lys Asp Thr Thr Leu Asn Trp Lys 
165 170 175 

Met Val Asp Ala Ser Lys Arg Leu Glu Ala Val His Glu Glu Leu Arg 
180 165 190 

Val Leu Ser Glu Asp Ala lie Arg Thr Ala Thr Glu Lys Pro Leu Gly 
195 200 205 



Glu Leu Trp Lys 
210 



(2) INFORMATION FOR SEQ ID NO : 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 188 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: iinear 

(ii) MOLECULE TYPE : protein 
(iii) HYPOTHETICAL: NO 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 6: 

Met Val Asp Asn Met Phe lie Val Phe Glu Gly lie Asp Gly Ser Gly 
15 10 15 

Lys Thr Thr Gin Ser Lys Leu Leu Ala Lys Lys Met Asp Ala Phe Trp 
20 25 30 

Thr Tyr Glu Pro Ser Asn Ser Leu Val Gly Lys lie lie Arg Glu lie 
35 40 45 

Leu Ser Gly Lys Thr Glu Val Asp Asn Lys Thr Leu Ala Leu Leu Phe 
50 55 60 

Ala Ala Asp Arg lie Glu His Thr Lys Leu lie Lys Glu Glu Leu Lys 
65 70 75 80 

Lys Arg Asp Val Val Cys Asp Arg Tyr Leu Tyr Ser Ser lie Ala Tyr 
85 90 95 

Gin Ser Val Ala Gly Val Asp Glu Asn Phe lie Lys Ser lie Asn Arg 
100 105 110 

Tyr Ala Leu Lys Pro Asp lie Val Phe Leu Leu lie Val Asp lie Glu 
115 120 125 

Thr Ala Leu Lys Arg Val Lys Thr Lys Asp lie Phe Glu Lys Lys Asp 
130 135 140 

Phe Leu Lys Lys Val Gin Asp Lys Tyr Leu Glu Leu Ala Glu Glu Tyr 
145 150 155 160 

Asn Phe lie Val lie Asp Thr Thr Lys Lys Ser Val Glu Glu Val His 
165 170 175 

Asn Glu lie lie Gly Tyr Leu Lys Asn lie Pro His 
180 185 



(2) INFORMATION- FOR SEQ ID NO : 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 227 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



Met Ala Ser Arg Arg Gly Ala Leu lie Val Leu Glu Gly Val Asp Arg 
15 10 15 
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Ala Gly Lys Thr Thr Gin Gly Leu Lys Leu Val Thr Ala Leu Cys Ala 
20 25 30 

Ser Gly His Arg Ala Glu Leu Leu Arg Phe Pro Glu Arg Ser Thr Glu 
35 1*\ 40 45 

lie Gly Lys Leu Leu Asn Ser Tyr Leu Glu Lys Lys Thr Glu Leu Glu 
50 55 60 

Asp His Ser Val His Leu Leu Phe Ser Ala Asn Arg Trp Glu Gin Val 
65 70 75 80 

Pro Leu lie Lys Ala Lys Leu Asn Gin Gly Val Thr Leu Val Leu Asp 
85 90 95 

Arg Tyr Ala Phe Ser Gly Val Ala Phe Thr Gly Ala Lys Glu Asn Phe 
100 105 110 

Ser Leu Asp Trp Cys Lys Gin Pro Asp Val Gly Leu Pro Lys Pro Asp 
115 120 125 

Leu lie Leu Phe Leu Gin Leu Gin Leu Leu Asp Ala Ala Ala Arg Gly 
130 135 140 

Glu Phe Gly Leu Glu Arg Tyr Glu Thr Gly Thr Phe Gin Lys Gin Val 
145 150 155 160 

Leu Leu Cys Phe Gin Gin Leu Met Glu Glu Lys Asn Leu Asn Trp Lys 
165 170 175 

Val Val Asp Ala Ser Lys Arg Thr Pro Ser Glu Thr Leu His Arg Gly 
180 185 190 

His Trp Gly Ser Tyr Gly Asn Lys Ser Ala Ser lie Ala Asn Thr lie 
195 200 205 

Phe Trp Phe Cys Lys Arg Leu Val Glu Gly Ser His Leu Tyr Thr lie 
210 215 220 



Ser Arg Ser 
225 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Lys Gin Gly Val Phe Val Ala lie Glu Gly Val Asp Gly Ala Gly 
15 10 15 

Lys Thr Val Leu Leu Glu Ala Phe Lys Gin Arg Phe Pro Gin Ser Phe 
20 25 30 

Leu Gly Phe Lys Thr Leu Phe Ser Arg Glu Pro Gly Gly Thr Pro Leu 
35 40 45 

Ala Glu Lys lie Arg Ala Leu Leu Leu His Glu Ala Met Glu Pro Leu 
50 55 60 

Thr Glu Ala Tyr Leu Phe Ala Ala Ser Arg Thr Glu His Val Arg Gin. 
65 70 75 80 

Leu lie Gin Pro Ala Leu Gin Gin Lys Gin Leu Val lie Val Asp Arg 
85 90 95 

Phe Val Trp Ser Ser Tyr Ala Tyr Gin Gly Leu lie Lys Lys Val Gly 
100 105 110 

Leu Asp Val Val Lys Lys Leu Asn Ala Asp Ala Val Gly Asp Ser Met 
115 120 125 

Pro Asp Phe Thr Phe lie Val Asp Cys Asp Phe Glu Thr Ala Leu Asn 
130 135 140 

Arg Met Ala Lys Arg Gly Gin Asp Asn Leu Leu Asp Asn Thr Val Lys 
145 150 155 160 

Lys Gin Ala Asp Phe Asn Thr Met Arg Gin Tyr Tyr His Ser Leu Val 
165 170 175 

Asp Asn Lys Arg Val Phe Leu Leu Asp Gly Gin Asn Gin Thr Gly Cys 
180 185 190 

Leu Glu Gin Phe lie Glu Gin Leu Ser Gin Cys Leu Thr Gin Pro Thr 
1£5 200 205 



Leu Ser 
210 



(2) INFORMATION FOR SEQ ID NO : 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(iii) HYPOTHETICAL: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9: 

Met Asn Lys Gly Val Phe Val Val lie Glu Gly Val Asp Gly Ala Gly 
15 10 15 

Lys Thr Ala Leu lie Glu Gly Phe Lys Lys Leu Tyr Pro Thr Lys Phe 
20 25 30 

Leu Asn Tyr Gin Leu Thr Tyr Thr Arg Glu Pro Gly Gly Thr Leu Leu 
35 40 45 

Ala Glu Lys lie Arg Gin Leu Leu Leu Asn Glu Thr Met Glu Pro Leu 
50 55 60 

Thr Glu Ala Tyr Leu Phe Ala Ala Ala Arg Thr Glu His lie Ser Lys 
65 70 75 80 

Leu lie Lys Pro Ala lie Glu Lys Glu Gin Leu Val lie Ser Asp Arg 
85 90 95 

Phe Val Phe Ser Ser Phe Ala Tyr Gin Gly Leu Ser Lys Lys lie Gly 
100 105 110 

lie Asp Thr Val Lys Gin lie Asn His His Ala Leu Arg Asn Met Met 
115 120 125 

Pro Asn Phe Thr Phe lie Leu Asp Cys Asn Phe Lys Glu Ala Leu Gin 
130 135 140 

Arg Met Gin Lys Arg Gly Asn Asp Asn Leu Leu Asp Glu Phe lie Lys 
145 150 155 160 

Gly Lys Asn Asp Phe Asp Thr Val Arg Ser Tyr Tyr Leu Ser Leu Val 
165 170 175 

Asp Lys Lys Asn Cys Phe Leu lie Asn Gly Asp Asn Lys Gin Glu His 
180 185 190 

Leu Glu Lys Phe lie Glu Leu Leu Thr Arg Cys Leu Gin Gin Pro Thr 
195 200 205 

His Tyr 
210 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 210 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Ser Lys Gin Asn Arg Gly Arg Leu lie Val lie Glu Gly Leu Asp 
15 10 15 

Arg Ser Gly Lys Ser Thr Gin Cys Gin Leu Leu Val Asp Lys Leu lie 
20 25 30 

Leu Asn Met Lys Arg Leu Lys Leu Phe Lys Phe Pro Asp Arg Thr Thr 
35 40 45 

Ala lie Gly Lys Lys lie Asp Asp Tyr Leu Thr Glu Ser Val Gin Leu 
50 55 60 

Asn Asp Gin Val lie His Leu Leu Phe Ser Ala Asn Arg Trp Glu Pro 
65 70 75 80 

Ser lie Tyr Tyr Arg Ala Asn Gin Gin Arg Cys Asn Cys lie Leu Asp 
85 90 95 

Arg Tyr Ala Phe Ser Gly lie Ala Phe Ser Ala Ala Lys Gly Leu Asp 
100 105 110 

Trp Glu Trp Cys Lys Ser Pro Asp Arg Gly Leu Thr Arg Pro Asp Leu 
115 120 125 

Val lie Phe Leu Asn Val Asp Pro Arg lie Ala Ala Thr Arg Gly Gin 
130 135 140 

Tyr Gly Glu Glu Arg Tyr Glu Lys lie Glu Met Gin Glu Lys Val Leu 
145 150 155 160 

Lys Asn Leu Gin Arg Leu Gin Lys Glu Phe Arg Glu Glu Gly Leu Glu 
165 170 175 

Phe lie Thr Leu Asp Ala Ser Ser Tyr Ala Leu Glu Asp Val Asp Ser 
_ 180 185 190 

Gin lie Val Asp Leu Val Ser Asn Val Asn lie His Glu Thr Leu Asp 
195 200 205 

Val Leu 
210 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 204 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

^ (ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ser Arg Gly Ala Leu lie Val Phe Glu Gly Leu Asp Lys Ser Gly 
15 10 15 

Lys Thr Thr Gin Cys Met Asn lie Met Glu Ser lie Pro Ala Asn Thr 
20 25 30 

lie Lys Tyr Leu Asn Phe Pro Gin Arg Ser Thr Val Thr Gly Lys Met 
35 40 45 

lie Asp Asp Tyr Leu Thr Arg Lys Lys Thr Tyr Asn Asp His lie Val. 
50 55 60 

Asn Leu Leu Phe Cys Ala Asn Arg Trp Glu Phe Ala Ser Phe lie Gin 
65 70 75 80 

Glu Gin Leu Glu Gin Gly lie Thr Leu lie Val Asp Arg Tyr Ala Phe 
85 90 95 

Ser Gly Val Ala Tyr Ala Ala Ala Lys Gly Ala Ser Met Thr Leu Ser 
100 105 110 

Lys Ser Tyr Glu Ser Gly Leu Pro Lys Pro Asp Leu Val lie Phe Leu 
115 120 125 

Glu Ser Gly Ser Lys Glu lie Asn Arg Asn Val Gly Glu Glu lie Tyr 
130 135 140 

Glu Asp Val Thr Phe Gin Gin Lys Val Leu Gin Glu Tyr Lys Lys Met 
145 150 155 160 

lie Glu Glu Gly Asp lie His Trp Gin lie lie Ser Ser Glu Phe Glu 
165 170 175 

Glu Asp Val Lys Lys Glu Leu lie Lys Asn lie Val lie Glu Ala lie 
ft* 180 185 190 

His Thr Val Thr Gly Pro Val Gly Gin Leu Trp Met 
!S 195 200 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 205 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Ser Arg Gly Ala Leu lie Val Phe Glu Gly Leu Asp Lys Ser Gly 
15 10 15 

Lys Thr Thr Gin Cys Met Asn lie Met Glu Ser lie Pro Thr Asn Thr 
20 25 30 

lie Lys Tyr Leu -Asn Phe Pro Gin Arg Ser Thr Val Thr Gly Lys Met 
35 40 45 

lie Asp Asp Tyr Leu Thr Arg Lys Lys Thr Tyr Asn Asp His lie Val 
50 55 60 

Asn Leu Leu Phe Cys Ala Asn Arg Trp Glu Phe Ala Ser Phe lie Gin 
65 70 75 80 

Glu Gin Leu Glu Gin Gly lie Thr Leu lie Val Asp Arg Tyr Ala Phe 
85 90 95 

Ser Gly Val Ala Tyr Ala Thr Ala Lys Gly Ala Ser Met Thr Leu Ser 
100 105 110 

Lys Ser Tyr Glu Ser Gly Leu Pro Lys Pro Asp Leu Val lie Phe Leu 
115 120 125 

Glu Ser Gly Ser Lys Glu lie Asn Arg Asn Val Gly Glu Glu lie Tyr 
130 135 140 

Glu Asp Val Ala Phe Gin Gin Lys Val Leu Gin Glu Tyr Lys Lys Met 
145 150 155 160 

lie Glu Glu Gly Glu Asp lie His Trp Gin lie lie Ser Ser Glu Phe 
165 170 175 

Glu Glu Asp Val Lys Lys Glu Leu lie Lys Asn He Val He Glu Ala 
180 185 190 

He His Thr Val Thr Gly Pro Val Gly Gin Leu Trp Met 
195 200 205 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 216 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
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(D) 


TOPOLOGY: linear 




















(ii) 


MOLECULE 


TYPE: protein 




















(iii) 


HYPOTHETICAL: NO 






















(xi) 


SEQUENCE 


DESCRIPTION: SEQ IE 


i NO: 


13 : 














Met 
1 


Met 


Gly Arg 


Gly 
5 


Lys 


Leu 


He 


Leu 


He 
10 


Glu 


Gly 


Leu 


Asp 


Arg 
15 


Thr 


Gly 


Lys 


Thr 


Thr 
20 


Gin 


Cys 


Asn 


He 


Leu 
25 


Tyr 


Lys 


Lys 


Leu 


Gin 
30 


Pro 


Asn 


Cys 


Lys 


Leu 
35 


Leu 


Lys 


Phe 


Pro 


Glu 
40 


Arg 


Ser 


Thr 


Arg 


He 
45 


Gly 


Gly Leu 


He 


Asn 
50 


Glu 


Tyr 


Leu 


Thr 


Asp 
55 


Asp 


Ser 


Phe 


Gin 


Leu 
60 


Ser 


Asp 


Gin 


Ala 


He 
65 


His 


Leu 


Leu 


Phe 


Ser 
70 


Ala 


Asn 


Arg 


Trp 


Glu 
75 


He 


Val 


Asp 


Lys 


He 
80 


Lys 


Lys 


Asp 


Leu 


Leu 
85 


Glu 


Gly 


Lys 


Asn 


He 
90 


Val 


Met 


Asp 


Arg 


Tyr 
95 


Val 


Tyr 


Ser Gly Val 
100 


Ala 


Tyr 


Ser 


Ala 


Ala 
105 


Lys 


Gly Thr 


Asn 


Gly 
110 


Met 


Asp 


Leu 


Asp 


Trp 
115 


Cys 


Leu 


Gin 


Pro 


Asp 
120 


Val 


Gly 


Leu 


Leu 


Lys 
125 


Pro 


Asp 


Leu 


Thr 


Leu 
130 


Phe 


Leu 


Ser 


Thr 


Gin 
135 


Asd 


Val 


Asp 


Asn 


Asn 
140 


Ala 


Glu 


Lys 


Ser 


Gly 
145 


Phe 


Gly Asp 


Glu 


Arg 
150 


Tyr 


Glu 


Thr 


Val 


Lys 
155 


Phe 


Gin 


Glu 


Lys 


Val 
160 


Lys 


Gin 


Thr 


Phe 


Met 
165 


Lys 


Leu 


Leu 


Asp 


Lys 
170 


Glu 


He 


Arg 


Lys 


Gly Asp 
175 


Glu 


Ser 


He 


Thr 
180 


He 


Val 


Asp 


Val 


Thr 
185 


Asn 


Lys 


Gly 


He 


Gin 
190 


Glu 


Val 


Glu 


Ala 


Leu 
195 


He 


Trp 


Gin 


He 


Val 
200 


Glu 


Pro 


Val 


Leu 


Ser 
205 


Thr 


His 


He 


Asp 


His 
210 


Asp 


Lys 


Phe 


Ser 


Phe 
215 


Phe 



















(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide" 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GGAATTCCAT ATGCGCAGTA AGTATATCGT C 31 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide" 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



CGCGGATCCT CATGCGTCCA ACTCCTTCAC CCAG 



34 



